public inbox for [email protected]  
help / color / mirror / Atom feed
eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
143+ messages / 14 participants
[nested] [flat]

* eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-06-23 20:25  Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-06-23 20:25 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>

Hi,

The attached patch set eliminates xl_heap_visible, the WAL record
emitted when a block of the heap is set all-visible/frozen in the
visibility map. Instead, it includes the information needed to update
the VM in the WAL record already emitted by the operation modifying
the heap page.

Currently COPY FREEZE and vacuum are the only operations that set the
VM. So, this patch modifies the xl_heap_multi_insert and xl_heap_prune
records.

The result is a dramatic reduction in WAL volume for these operations.
I've included numbers below.

I also think that it makes more sense to include changes to the VM in
the same WAL record as the changes that rendered the page all-visible.
In some cases, we will only set the page all-visible, but that is in
the context of the operation on the heap page which discovered that it
was all-visible. Therefore, I find this to be a clarity as well as a
performance improvement.

This project is also the first step toward setting the VM on-access
for queries which do not modify the page. There are a few design
issues that must be sorted out for that project which I will detail
separately. Note that this patch set currently does not implement
setting the VM on-access.

The attached patch set isn't 100% polished. I think some of the
variable names and comments could use work, but I'd like to validate
the idea of doing this before doing a full polish. This is a summary
of what is in the set:

Patches:
0001 - 0002: cleanup
0003 - 0004: refactoring
0005: COPY FREEZE changes
0006: refactoring
0007: vacuum phase III changes
0008: vacuum phase I empty page changes
0009 - 0012: refactoring
0013: vacuum phase I normal page changes
0014: cleanup

Performance benefits of eliminating xl_heap_visible:

vacuum of table with index (DDL at bottom of email)
--
master -> patch
WAL bytes: 405346 -> 303088 = 25% reduction
WAL records: 6682 -> 4459 = 33% reduction

vacuum of table without index
--
master -> patch
WAL records: 4452 -> 2231 = 50% reduction
WAL bytes: 289016 -> 177978 = 38% reduction

COPY FREEZE of table without index
--
master -> patch
WAL records: 3672777 -> 1854589 = 50% reduction
WAL bytes: 841340339 -> 748545732  = 11% reduction (new pages need a
copy of the whole page)

table for vacuum example:
--
create table foo(a int, b numeric, c numeric) with (autovacuum_enabled= false);
insert into foo select i % 18, repeat('1', 400)::numeric, repeat('2',
400)::numeric from generate_series(1,40000)i;
-- don't make index for no-index case
create index on foo(a);
delete from foo where a = 1;
vacuum (verbose, process_toast false) foo;


copy freeze example:
--
-- create a data file
create table large(a int, b int) with (autovacuum_enabled = false,
fillfactor = 10);
insert into large SELECT generate_series(1,40000000)i, 1;
copy large to 'large.data';

-- example
BEGIN;
create table large(a int, b int) with (autovacuum_enabled = false,
fillfactor = 10);
COPY large FROM 'large.data' WITH (FREEZE);
COMMIT;

- Melanie


Attachments:

  [text/x-patch] v1-0002-Simplify-vacuum-VM-update-logging-counters.patch (2.9K, 2-v1-0002-Simplify-vacuum-VM-update-logging-counters.patch)
  download | inline diff:
From 6cbbdd359ae4de835bbd77369b598885e8a279b2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 16:12:15 -0400
Subject: [PATCH v1 02/14] Simplify vacuum VM update logging counters

We can simplify the VM counters added in dc6acfd910b8 to
lazy_vacuum_heap_page() and lazy_scan_new_or_empty().

We won't invoke lazy_vacuum_heap_page() unless there are dead line
pointers, so we know the page can't be all-visible.

In lazy_scan_new_or_empty(), we only update the VM if the page-level
hint PD_ALL_VISIBLE is clear, and the VM bit cannot be set if the page
level bit is clear because a subsequent page update would fail to clear
the visibility map bit.

Simplify the logic for determining which log counters to increment based
on this knowledge.
---
 src/backend/access/heap/vacuumlazy.c | 32 +++++++++++-----------------
 1 file changed, 12 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 09416450af9..c8da2f835c4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1900,17 +1900,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 										   VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
-			/*
-			 * If the page wasn't already set all-visible and/or all-frozen in
-			 * the VM, count it as newly set for logging.
-			 */
-			if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-			{
-				vacrel->vm_new_visible_pages++;
-				vacrel->vm_new_visible_frozen_pages++;
-			}
-			else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0)
-				vacrel->vm_new_frozen_pages++;
+			/* VM bits cannot have been set if PD_ALL_VISIBLE was clear */
+			Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
+			(void) old_vmbits; /* Silence compiler */
+			/* Count the newly all-frozen pages for logging. */
+			vacrel->vm_new_visible_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 		}
 
 		freespace = PageGetHeapFreeSpace(page);
@@ -2930,20 +2925,17 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 									   vmbuffer, visibility_cutoff_xid,
 									   flags);
 
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		/* We know the page should not have been all-visible */
+		Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
+		(void) old_vmbits; /* Silence compiler */
+
+		/* Count the newly set VM page for logging */
+		if ((flags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 		{
 			vacrel->vm_new_visible_pages++;
 			if (all_frozen)
 				vacrel->vm_new_visible_frozen_pages++;
 		}
-
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 all_frozen)
-			vacrel->vm_new_frozen_pages++;
 	}
 
 	/* Revert to the previous phase information for error traceback */
-- 
2.34.1



  [text/x-patch] v1-0004-Introduce-unlogged-versions-of-VM-functions.patch (6.0K, 3-v1-0004-Introduce-unlogged-versions-of-VM-functions.patch)
  download | inline diff:
From 9750354f2b7d7bd3afd38fca5e0ca2dd814a19a2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v1 04/14] Introduce unlogged versions of VM functions

Future commits will eliminate usages of xl_heap_visible and incorporate
setting the VM into the WAL records making other changes to the heap
page. As a step toward this make versions of the functions which update
the VM and its heap-specific wrapper which do not emit their own WAL.

These will be used in follow-on commits.
---
 src/backend/access/heap/heapam.c        | 44 ++++++++++++++++++++++++
 src/backend/access/heap/visibilitymap.c | 45 +++++++++++++++++++++++++
 src/include/access/heapam.h             |  3 ++
 src/include/access/visibilitymap.h      |  2 ++
 4 files changed, 94 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dc409fd3a60..15dc3d88843 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7868,6 +7868,50 @@ heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
 							 InvalidXLogRecPtr, vmbuf, cutoff_xid, vmflags, set_heap_lsn);
 }
 
+/*
+ * Ensure the provided heap page is marked PD_ALL_VISIBLE and then set the
+ * provided vmflags in the provided vmbuf.
+ *
+ * Both the heap page and VM page should be pinned and exclusive locked.
+ * You must pass a VM buffer containing the correct page of the map
+ * corresponding to the passed in heap block.
+ *
+ * This should only be called in a critical section that also emits WAL (as
+ * needed) for both heap page changes and VM page changes.
+ */
+uint8
+heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+				 Buffer vmbuf, uint8 vmflags)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+
+	Assert(BufferIsValid(heap_buf));
+	Assert(CritSectionCount > 0);
+
+	/* Check that we have the right heap page pinned */
+	if (BufferGetBlockNumber(heap_buf) != heap_blk)
+		elog(ERROR, "wrong heap buffer passed to heap_page_set_vm");
+
+	/*
+	 * We must never end up with the VM bit set and the page-level
+	 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+	 * modification would fail to clear the VM bit.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM. Since
+	 * Postgres 19, since heap page modifications are done in the same
+	 * critical section as setting the VM bits, that should not longer happen.
+	 */
+	if (!PageIsAllVisible(heap_page))
+	{
+		PageSetAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+	}
+
+	return visibilitymap_set_vmbyte(rel, heap_blk, vmbuf, vmflags);
+}
+
 /*
  * heap_tuple_should_freeze
  *
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 45721399122..9f27ace0e1c 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -317,6 +317,51 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf.
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page.
+ */
+uint8
+visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9375296062f..5127fdb9c77 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -360,6 +360,9 @@ extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
 									 TransactionId *NoFreezePageRelfrozenXid,
 									 MultiXactId *NoFreezePageRelminMxid);
 extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
+
+extern uint8 heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+							  Buffer vmbuf, uint8 vmflags);
 extern uint8 heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
 									  Buffer vmbuf, TransactionId cutoff_xid,
 									  uint8 vmflags);
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 4fa4f837535..5d0a9417c25 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags, bool set_heap_lsn);
+extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.34.1



  [text/x-patch] v1-0001-Remove-unused-check-in-heap_xlog_insert.patch (1.3K, 4-v1-0001-Remove-unused-check-in-heap_xlog_insert.patch)
  download | inline diff:
From 593d33896dcb618f806b911e80fd448fdacbba0a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 12 Jun 2025 15:38:37 -0400
Subject: [PATCH v1 01/14] Remove unused check in heap_xlog_insert()

8e03eb92e9ad54e2 reverted the commit 39b66a91bd which allowed freezing
in the heap_insert() code path but did not remove the corresponding
check in heap_xlog_insert(). This code is extraneous but not harmful.
However, cleaning it up makes it very clear that, as of now, we do not
support any freezing of pages in the heap_insert() path.
---
 src/backend/access/heap/heapam_xlog.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 30f4c2d3c67..fa94e104f1c 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -508,9 +508,8 @@ heap_xlog_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
-		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
-			PageSetAllVisible(page);
+		/* This should not happen in the heap_insert() code path */
+		Assert(!(xlrec->flags & XLH_INSERT_ALL_FROZEN_SET));
 
 		MarkBufferDirty(buffer);
 	}
-- 
2.34.1



  [text/x-patch] v1-0003-Introduce-heap-specific-wrapper-for-visibilitymap.patch (12.8K, 5-v1-0003-Introduce-heap-specific-wrapper-for-visibilitymap.patch)
  download | inline diff:
From 904a31bb8f519f5a9e4b30d9010edf506cddad1f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:06:45 -0400
Subject: [PATCH v1 03/14] Introduce heap-specific wrapper for
 visibilitymap_set()

visibilitymap_set(), which sets bits in the visibility map corresponding
to the heap block of the table passed in, arguably contains some
layering violations.

For example, it sets the heap page's LSN when checksums/wal_log_hints
are enabled. However, the caller may not need to set PD_ALL_VISIBLE
(when it is already set) and thus may not have marked the buffer dirty.
visibilitymap_set() will still set the page LSN in this case, even when
it would have been correct for the caller to *not* mark the buffer
dirty.

Also, every caller that needs to has to remember to set PD_ALL_VISIBLE
and mark the buffer dirty. This commit introduces a wrapper that does
this and a flag to visibilitymap_set() indicating whether or not the
heap page LSN should be set.
---
 src/backend/access/heap/heapam.c        | 62 ++++++++++++++++++-------
 src/backend/access/heap/heapam_xlog.c   |  2 +-
 src/backend/access/heap/vacuumlazy.c    | 60 ++++++------------------
 src/backend/access/heap/visibilitymap.c | 19 ++++----
 src/include/access/heapam.h             |  3 ++
 src/include/access/visibilitymap.h      |  2 +-
 6 files changed, 73 insertions(+), 75 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..dc409fd3a60 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2505,8 +2505,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
-		else if (all_frozen_set)
-			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2632,23 +2630,16 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		/*
 		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
+		 * We're already holding pin on the vmbuffer. It's fine to use
+		 * InvalidTransactionId here - this is only used when
+		 * HEAP_INSERT_FROZEN is specified, which intentionally violates
+		 * visibility rules.
 		 */
 		if (all_frozen_set)
-		{
-			Assert(PageIsAllVisible(page));
-			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			heap_page_set_vm_and_log(relation, BufferGetBlockNumber(buffer), buffer,
+									 vmbuffer,
+									 InvalidTransactionId,
+									 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
@@ -7840,6 +7831,43 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
 	return false;
 }
 
+/*
+ * Make the heap and VM page changes needed to set a page all-visible.
+ * Do not call in recovery.
+ */
+uint8
+heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+						 Buffer vmbuf, TransactionId cutoff_xid,
+						 uint8 vmflags)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+	bool		set_heap_lsn = false;
+
+	Assert(BufferIsValid(heap_buf));
+
+	/* Check that we have the right heap page pinned, if present */
+	if (BufferGetBlockNumber(heap_buf) != heap_blk)
+		elog(ERROR, "wrong heap buffer passed to heap_page_set_vm_and_log");
+
+	/*
+	 * We must never end up with the VM bit set and the page-level
+	 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+	 * modification would fail to clear the VM bit. Though it is possible for
+	 * the page-level bit to be set and the VM bit to be clear if checksums
+	 * and wal_log_hints are not enabled.
+	 */
+	if (!PageIsAllVisible(heap_page))
+	{
+		PageSetAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		if (XLogHintBitIsNeeded())
+			set_heap_lsn = true;
+	}
+
+	return visibilitymap_set(rel, heap_blk, heap_buf,
+							 InvalidXLogRecPtr, vmbuf, cutoff_xid, vmflags, set_heap_lsn);
+}
+
 /*
  * heap_tuple_should_freeze
  *
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index fa94e104f1c..cfd4fc3327d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -298,7 +298,7 @@ heap_xlog_visible(XLogReaderState *record)
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 
 		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
+						  xlrec->snapshotConflictHorizon, vmbits, false);
 
 		ReleaseBuffer(vmbuffer);
 		FreeFakeRelcacheEntry(reln);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c8da2f835c4..5e662936dd7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1892,12 +1892,10 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				PageGetLSN(page) == InvalidXLogRecPtr)
 				log_newpage_buffer(buf, true);
 
-			PageSetAllVisible(page);
-			old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-										   InvalidXLogRecPtr,
-										   vmbuffer, InvalidTransactionId,
-										   VISIBILITYMAP_ALL_VISIBLE |
-										   VISIBILITYMAP_ALL_FROZEN);
+			old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+												  vmbuffer, InvalidTransactionId,
+												  VISIBILITYMAP_ALL_VISIBLE |
+												  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
 			/* VM bits cannot have been set if PD_ALL_VISIBLE was clear */
@@ -2074,25 +2072,9 @@ lazy_scan_prune(LVRelState *vacrel,
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
+		old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+											  vmbuffer, presult.vm_conflict_horizon,
+											  flags);
 
 		/*
 		 * If the page wasn't already set all-visible and/or all-frozen in the
@@ -2164,17 +2146,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		uint8		old_vmbits;
 
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
 		/*
 		 * Set the page all-frozen (and all-visible) in the VM.
 		 *
@@ -2183,11 +2154,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * was logged when the page's tuples were frozen.
 		 */
 		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
+		old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+											  vmbuffer, InvalidTransactionId,
+											  VISIBILITYMAP_ALL_VISIBLE |
+											  VISIBILITYMAP_ALL_FROZEN);
 
 		/*
 		 * The page was likely already set all-visible in the VM. However,
@@ -2919,11 +2889,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
-		PageSetAllVisible(page);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buffer,
-									   InvalidXLogRecPtr,
-									   vmbuffer, visibility_cutoff_xid,
-									   flags);
+		old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buffer,
+											  vmbuffer, visibility_cutoff_xid,
+											  flags);
 
 		/* We know the page should not have been all-visible */
 		Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 745a04ef26e..45721399122 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -232,9 +232,10 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
  * when a page that is already all-visible is being marked all-frozen.
  *
  * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
+ * this function. Except in recovery, caller should also pass the heap buffer.
+ * When checksums are enabled and we're not in recovery, if the heap page was
+ * modified, we must add the heap buffer to the WAL chain to protect it from
+ * being torn.
  *
  * You must pass a buffer containing the correct map page to this function.
  * Call visibilitymap_pin first to pin the right one. This function doesn't do
@@ -245,7 +246,7 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 uint8
 visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
+				  uint8 flags, bool set_heap_lsn)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
@@ -259,16 +260,12 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 #endif
 
 	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
+	Assert(!(InRecovery && set_heap_lsn));
 	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
 
 	/* Must never set all_frozen bit without also setting all_visible bit */
 	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
 
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
 	/* Check that we have the right VM page pinned */
 	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
 		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
@@ -301,10 +298,12 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 				 * WAL record inserted above, so it would be incorrect to
 				 * update the heap page's LSN.
 				 */
-				if (XLogHintBitIsNeeded())
+				if (set_heap_lsn)
 				{
 					Page		heapPage = BufferGetPage(heapBuf);
 
+					Assert(XLogHintBitIsNeeded());
+
 					PageSetLSN(heapPage, recptr);
 				}
 			}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3a9424c19c9..9375296062f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -360,6 +360,9 @@ extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
 									 TransactionId *NoFreezePageRelfrozenXid,
 									 MultiXactId *NoFreezePageRelminMxid);
 extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
+extern uint8 heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+									  Buffer vmbuf, TransactionId cutoff_xid,
+									  uint8 vmflags);
 
 extern void simple_heap_insert(Relation relation, HeapTuple tup);
 extern void simple_heap_delete(Relation relation, ItemPointer tid);
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..4fa4f837535 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -36,7 +36,7 @@ extern uint8 visibilitymap_set(Relation rel,
 							   XLogRecPtr recptr,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
-							   uint8 flags);
+							   uint8 flags, bool set_heap_lsn);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.34.1



  [text/x-patch] v1-0005-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch (6.8K, 6-v1-0005-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch)
  download | inline diff:
From b8dacf8fed00b3d1fcf59e61adb1541ba68746a0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:40:28 -0400
Subject: [PATCH v1 05/14] Eliminate xl_heap_visible in COPY FREEZE

Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, include the required update in the xl_heap_multi_insert
record instead.
---
 src/backend/access/heap/heapam.c       | 42 +++++++++++++++++---------
 src/backend/access/heap/heapam_xlog.c  | 37 ++++++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c |  5 +++
 3 files changed, 69 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 15dc3d88843..3d9b114b4e8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2493,9 +2493,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * If the page is all visible, need to clear that, unless we're only
 		 * going to add further frozen rows to it.
-		 *
-		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2506,6 +2503,22 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
 
+		/*
+		 * If we're only adding already frozen rows to a previously empty
+		 * page, mark it as all-visible. And if we've frozen everything on the
+		 * page, update the visibility map. We're already holding a pin on the
+		 * vmbuffer.
+		 */
+		else if (all_frozen_set)
+		{
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			heap_page_set_vm(relation,
+							 BufferGetBlockNumber(buffer), buffer,
+							 vmbuffer,
+							 VISIBILITYMAP_ALL_VISIBLE |
+							 VISIBILITYMAP_ALL_FROZEN);
+		}
+
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
 		 */
@@ -2552,6 +2565,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2614,7 +2633,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2624,22 +2646,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer. It's fine to use
-		 * InvalidTransactionId here - this is only used when
-		 * HEAP_INSERT_FROZEN is specified, which intentionally violates
-		 * visibility rules.
-		 */
 		if (all_frozen_set)
-			heap_page_set_vm_and_log(relation, BufferGetBlockNumber(buffer), buffer,
-									 vmbuffer,
-									 InvalidTransactionId,
-									 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cfd4fc3327d..a0f3673621a 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -552,6 +552,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -572,11 +573,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -663,6 +664,40 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Now read and update the VM block. Even if we skipped updating the heap
+	 * page due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * It is only okay to set the VM bits without holding the heap page lock
+	 * because we can expect no other writers of this page.
+	 */
+	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+
+		uint8		old_vmbits = 0;
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		visibilitymap_pin(reln, blkno, &vmbuffer);
+		old_vmbits = visibilitymap_set_vmbyte(reln, blkno,
+											  vmbuffer,
+											  VISIBILITYMAP_ALL_VISIBLE |
+											  VISIBILITYMAP_ALL_FROZEN);
+		Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
+		(void) old_vmbits; /* Silence compiler */
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
-- 
2.34.1



  [text/x-patch] v1-0009-Combine-lazy_scan_prune-VM-corruption-cases.patch (7.1K, 7-v1-0009-Combine-lazy_scan_prune-VM-corruption-cases.patch)
  download | inline diff:
From 40308800989edf1821639cd18e0c2630f4417c22 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v1 09/14] Combine lazy_scan_prune VM corruption cases

lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.

Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.

This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
 src/backend/access/heap/vacuumlazy.c | 115 +++++++++++++++++----------
 1 file changed, 74 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b1ff49bee6b..8328cab0955 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,6 +431,13 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1940,6 +1947,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
+
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2084,9 +2151,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * all_frozen variables. Start by looking for any VM corruption.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	{
+		/* Don't update the VM if we just cleared corruption in it */
+	}
+	else if (!all_visible_according_to_vm && presult.all_visible)
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2122,45 +2194,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-- 
2.34.1



  [text/x-patch] v1-0006-Make-heap_page_is_all_visible-independent-of-LVRe.patch (5.1K, 8-v1-0006-Make-heap_page_is_all_visible-independent-of-LVRe.patch)
  download | inline diff:
From 797f6f09cf7287af4b4e929e903e115a767df145 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v1 06/14] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need two parameters from the
LVRelState, so just pass those in explicitly.
---
 src/backend/access/heap/vacuumlazy.c | 45 ++++++++++++++++++----------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5e662936dd7..0cf4a69c431 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,8 +464,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2010,8 +2013,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2877,8 +2881,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * emitted.
 	 */
 	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -3568,9 +3572,16 @@ dead_items_cleanup(LVRelState *vacrel)
 
 /*
  * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * Return the visibility_cutoff_xid which is the highest xmin amongst the
+ * visible tuples. Sets *all_frozen to true if every tuple on this page is
+ * frozen.
  *
  * This is a stripped down version of lazy_scan_prune().  If you change
  * anything here, make sure that everything stays in sync.  Note that an
@@ -3578,9 +3589,11 @@ dead_items_cleanup(LVRelState *vacrel)
  * introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3603,7 +3616,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3627,9 +3640,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
 										 buf))
 		{
 			case HEAPTUPLE_LIVE:
@@ -3650,7 +3663,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+											   OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3685,7 +3698,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
-- 
2.34.1



  [text/x-patch] v1-0007-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch (25.3K, 9-v1-0007-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch)
  download | inline diff:
From 087566e214b67713390f742dfd825d330d3f8360 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v1 07/14] Eliminate xl_heap_visible from vacuum phase III

Instead of emitting a separate xl_heap_visible record for each page that
is all-visible after vacuum's third phase, use the VM-related options
when emitting the xl_heap_prune record with the changes vacuum makes in
phase III.
---
 src/backend/access/heap/heapam_xlog.c  | 148 +++++++++++++++++++---
 src/backend/access/heap/pruneheap.c    |  48 +++++++-
 src/backend/access/heap/vacuumlazy.c   | 164 ++++++++++++++++---------
 src/backend/access/rmgrdesc/heapdesc.c |  13 +-
 src/include/access/heapam.h            |   9 ++
 src/include/access/heapam_xlog.h       |   3 +
 6 files changed, 308 insertions(+), 77 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a0f3673621a..bb6680c0467 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. If pruning, that
+	 * means we cannot remove tuples still visible to transactions on the
+	 * standby. If freezing, that means we cannot freeze tuples with xids that
+	 * are still considered running on the standby. And for setting the VM, we
+	 * cannot do so if the page isn't all-visible to all transactions on the
+	 * standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -70,13 +76,28 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 												rlocator);
 	}
 
+	/* Next are the optionally included vmflags. Copy them out for later use. */
+	if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+	{
+		memcpy(&vmflags, maindataptr, sizeof(uint8));
+		maindataptr += sizeof(uint8);
+
+		/*
+		 * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
+		 * because we already have XLHP_IS_CATALOG_REL.
+		 */
+		Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+		/* Must never set all_frozen bit without also setting all_visible bit */
+		Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
+	}
+
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = (Page) BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -89,6 +110,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Size		datalen;
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
+		bool		do_prune;
+		bool		mark_buffer_dirty;
+		bool		set_heap_lsn;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +121,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+		set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 ||
+			   vmflags & VISIBILITYMAP_VALID_BITS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,26 +169,78 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		Assert(BufferIsValid(buffer) &&
+			   BufferGetBlockNumber(buffer) == blkno);
+
+		/*
+		 * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+		 * also going to set bits in the VM later.
+		 *
+		 * We must never end up with the VM bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+		 * modification would fail to clear the VM bit.
+		 *
+		 * Prior to Postgres 19, it was possible for the page-level bit to be
+		 * set and the VM bit to be clear. This could happen if we crashed
+		 * after setting PD_ALL_VISIBLE but before setting bits in the VM.
+		 */
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+
+			/*
+			 * Setting PD_ALL_VISIBLE only forces us to update the heap page
+			 * LSN if checksums or wal_log_hints are enabled (in which case we
+			 * must). This exposes us to torn page hazards, but since we're
+			 * not inspecting the existing page contents in any way, we don't
+			 * care.
+			 */
+			set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+			mark_buffer_dirty = true;
+		}
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
 
-		PageSetLSN(page, lsn);
-		MarkBufferDirty(buffer);
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+		if (set_heap_lsn)
+			PageSetLSN(page, lsn);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or will be setting a page in
+	 * the visibility map, update the free space map.
+	 *
+	 * Even if we are just updating the VM (and thus not freeing up any
+	 * space), we'll still update the FSM for this page. Since FSM is not
+	 * WAL-logged and only updated heuristically, it easily becomes stale in
+	 * standbys.  If the standby is later promoted and runs VACUUM, it will
+	 * skip updating individual free space figures for pages that became
+	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
+	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+	 * space values to upper FSM layers; later inserters try to use such pages
+	 * only to find out that they are unusable.  This can cause long stalls
+	 * when there are many such pages.
+	 *
+	 * Forestall those problems by updating FSM's idea about a page that is
+	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
+	 *
+	 * We want to avoid holding an exclusive lock on the heap buffer while
+	 * doing IO (either of the FSM or the VM), so we'll release the lock on
+	 * the heap buffer before doing either.
 	 */
 	if (BufferIsValid(buffer))
 	{
-		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
-						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
+		if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+							XLHP_HAS_DEAD_ITEMS |
+							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+			vmflags & VISIBILITYMAP_VALID_BITS)
 		{
 			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
 
@@ -168,6 +251,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		else
 			UnlockReleaseBuffer(buffer);
 	}
+
+	/*
+	 * Read and update the VM block. Even if we skipped updating the heap page
+	 * due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * Note that it is *only* okay that we do not hold a lock on the heap page
+	 * because we are in recovery and can expect no other writers to clear
+	 * PD_ALL_VISIBLE before we are able to update the VM.
+	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		uint8		old_vmbits = 0;
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		visibilitymap_pin(reln, blkno, &vmbuffer);
+		old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a8025889be0..d9ba0f96e34 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -2045,12 +2047,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer,
+						  uint8 vmflags,
+						  bool set_pd_all_vis,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
 						  PruneReason reason,
@@ -2062,6 +2075,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2084,19 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
 	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2136,6 +2165,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Prepare the main xl_heap_prune record.  We already set the XLHP_HAS_*
 	 * flag above.
 	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		xlrec.flags |= XLHP_HAS_VMFLAGS;
 	if (RelationIsAccessibleInLogicalDecoding(relation))
 		xlrec.flags |= XLHP_IS_CATALOG_REL;
 	if (TransactionIdIsValid(conflict_xid))
@@ -2150,6 +2181,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	XLogRegisterData(&xlrec, SizeOfHeapPrune);
 	if (TransactionIdIsValid(conflict_xid))
 		XLogRegisterData(&conflict_xid, sizeof(TransactionId));
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterData(&vmflags, sizeof(uint8));
 
 	switch (reason)
 	{
@@ -2168,5 +2201,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+
+	/*
+	 * If pruning or freezing tuples or setting the page all-visible when
+	 * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+	 * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+	 * but this is deemed okay for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+		PageSetLSN(BufferGetPage(buffer), recptr);
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0cf4a69c431..32f21d20194 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,11 +464,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
+static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+												   TransactionId OldestXmin,
+												   OffsetNumber *deadoffsets,
+												   int allowed_num_offsets,
+												   bool *all_frozen,
+												   TransactionId *visibility_cutoff_xid,
+												   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2817,8 +2819,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		old_vmbits = 0;
+	uint8		vmflags = 0;
+	bool		set_pd_all_vis = false;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2829,6 +2835,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
+											   vacrel->cutoffs.OldestXmin,
+											   deadoffsets, num_offsets,
+											   &all_frozen, &visibility_cutoff_xid,
+											   &vacrel->offnum))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2848,6 +2868,21 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
+		Assert(!PageIsAllVisible(page));
+		set_pd_all_vis = true;
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		old_vmbits = heap_page_set_vm(vacrel->rel,
+									  blkno, buffer,
+									  vmbuffer, vmflags);
+
+		/* We know the page should not have been all-visible */
+		Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
+		(void) old_vmbits; /* Silence compiler */
+		conflict_xid = visibility_cutoff_xid;
+	}
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2857,7 +2892,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  vmbuffer,
+								  vmflags,
+								  set_pd_all_vis,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
@@ -2866,48 +2904,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
-	 */
 	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
-	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
-								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buffer,
-											  vmbuffer, visibility_cutoff_xid,
-											  flags);
-
-		/* We know the page should not have been all-visible */
-		Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
-		(void) old_vmbits; /* Silence compiler */
-
-		/* Count the newly set VM page for logging */
-		if ((flags & VISIBILITYMAP_ALL_VISIBLE) != 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (all_frozen)
-				vacrel->vm_new_visible_frozen_pages++;
-		}
+			vacrel->vm_new_visible_frozen_pages++;
 	}
 
 	/* Revert to the previous phase information for error traceback */
@@ -3570,6 +3574,25 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
+/*
+ * Wrapper for heap_page_is_all_visible_except_lpdead() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+												  NULL, 0,
+												  all_frozen,
+												  visibility_cutoff_xid,
+												  logging_offnum);
+}
+
 /*
  * Check if every tuple in the given page is visible to all current and future
  * transactions.
@@ -3583,23 +3606,35 @@ dead_items_cleanup(LVRelState *vacrel)
  * visible tuples. Sets *all_frozen to true if every tuple on this page is
  * frozen.
  *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * deadoffsets are the offsets we know about and are about to set LP_UNUSED.
+ * allowed_num_offsets is the number of those. As long as the LP_DEAD items we
+ * encounter on the page match those exactly, we can set the page all-visible
+ * in the VM.
+ *
+ * Callers looking to verify that the page is all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
+heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+									   TransactionId OldestXmin,
+									   OffsetNumber *deadoffsets,
+									   int allowed_num_offsets,
+									   bool *all_frozen,
+									   TransactionId *visibility_cutoff_xid,
+									   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	OffsetNumber current_dead_offsets[MaxHeapTuplesPerPage];
+	size_t		current_num_offsets = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
@@ -3631,9 +3666,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			current_dead_offsets[current_num_offsets++] = offnum;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
@@ -3700,7 +3734,23 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
-	return all_visible;
+	/* If we already know it's not all-visible, return false */
+	if (!all_visible)
+		return false;
+
+	/* If we weren't allowed any dead offsets, we're done */
+	if (allowed_num_offsets == 0)
+		return current_num_offsets == 0;
+
+	/* If the number of dead offsets has changed, that's wrong */
+	if (current_num_offsets != allowed_num_offsets)
+		return false;
+
+	Assert(deadoffsets);
+
+	/* The dead offsets must be the same dead offsets */
+	return memcmp(current_dead_offsets, deadoffsets,
+				  allowed_num_offsets * sizeof(OffsetNumber)) == 0;
 }
 
 /*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..d6c86ccac20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -266,6 +266,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 {
 	char	   *rec = XLogRecGetData(record);
 	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+	char	   *maindataptr = rec + SizeOfHeapPrune;
 
 	info &= XLOG_HEAP_OPMASK;
 	if (info == XLOG_HEAP2_PRUNE_ON_ACCESS ||
@@ -278,7 +279,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		{
 			TransactionId conflict_xid;
 
-			memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
+			memcpy(&conflict_xid, maindataptr, sizeof(TransactionId));
+			maindataptr += sizeof(TransactionId);
 
 			appendStringInfo(buf, "snapshotConflictHorizon: %u",
 							 conflict_xid);
@@ -287,6 +289,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & XLHP_HAS_VMFLAGS)
+		{
+			uint8		vmflags;
+
+			memcpy(&vmflags, maindataptr, sizeof(uint8));
+			maindataptr += sizeof(uint8);
+			appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+		}
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 5127fdb9c77..d2ac380bb64 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -343,6 +343,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
 										   Buffer buffer);
 extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
@@ -393,6 +399,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer,
+									  uint8 vmflags,
+									  bool vm_modified_heap_page,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
 									  PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..ceae9c083ff 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -295,6 +295,9 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
 
+/* If the record should update the VM, this is the new value */
+#define		XLHP_HAS_VMFLAGS			(1 << 0)
+
 /* to handle recovery conflict during logical decoding on standby */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
-- 
2.34.1



  [text/x-patch] v1-0008-Use-xl_heap_prune-record-for-setting-empty-pages-.patch (6.4K, 10-v1-0008-Use-xl_heap_prune-record-for-setting-empty-pages-.patch)
  download | inline diff:
From 52dada80db2b5f5f6e5810c633d953d03ad10c05 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v1 08/14] Use xl_heap_prune record for setting empty pages
 all-visible

As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
 src/backend/access/heap/pruneheap.c  | 14 ++++--
 src/backend/access/heap/vacuumlazy.c | 64 ++++++++++++++++++----------
 src/include/access/heapam.h          |  1 +
 3 files changed, 54 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d9ba0f96e34..97e51f78854 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  false,
 									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
@@ -2051,6 +2052,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
  * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
  * the page LSN when checksums/wal_log_hints are enabled even if we did not
  * prune or freeze tuples on the page.
@@ -2061,6 +2065,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  bool force_heap_fpi,
 						  Buffer vmbuffer,
 						  uint8 vmflags,
 						  bool set_pd_all_vis,
@@ -2089,13 +2094,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlrec.flags = 0;
 	regbuf_flags = REGBUF_STANDARD;
 
+	if (force_heap_fpi)
+		regbuf_flags |= REGBUF_FORCE_IMAGE;
+
 	/*
 	 * We can avoid an FPI if the only modification we are making to the heap
 	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+	else if (!do_prune &&
+			 nfrozen == 0 &&
+			 (!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 32f21d20194..b1ff49bee6b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1850,6 +1850,7 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 	if (PageIsEmpty(page))
 	{
+
 		/*
 		 * It seems likely that caller will always be able to get a cleanup
 		 * lock on an empty page.  But don't take any chances -- escalate to
@@ -1877,35 +1878,53 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
-			uint8		old_vmbits;
+			uint8		old_vmbits = 0;
+			uint8		new_vmbits = 0;
 
-			START_CRIT_SECTION();
+			new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+				VISIBILITYMAP_ALL_FROZEN;
 
-			/* mark buffer dirty before writing a WAL record */
-			MarkBufferDirty(buf);
+			START_CRIT_SECTION();
 
-			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
-			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				PageGetLSN(page) == InvalidXLogRecPtr)
-				log_newpage_buffer(buf, true);
-
-			old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
-												  vmbuffer, InvalidTransactionId,
-												  VISIBILITYMAP_ALL_VISIBLE |
-												  VISIBILITYMAP_ALL_FROZEN);
-			END_CRIT_SECTION();
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			old_vmbits = heap_page_set_vm(vacrel->rel, blkno, buf,
+										  vmbuffer, new_vmbits);
 
 			/* VM bits cannot have been set if PD_ALL_VISIBLE was clear */
 			Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
 			(void) old_vmbits; /* Silence compiler */
+
+			/* Should have set PD_ALL_VISIBLE and marked buf dirty */
+			Assert(BufferIsDirty(buf));
+
+			if (RelationNeedsWAL(vacrel->rel))
+			{
+				/*
+				 * It's possible that another backend has extended the heap,
+				 * initialized the page, and then failed to WAL-log the page
+				 * due to an ERROR.  Since heap extension is not WAL-logged,
+				 * recovery might try to replay our record setting the page
+				 * all-visible and find that the page isn't initialized, which
+				 * will cause a PANIC. To prevent that, if the page hasn't
+				 * been previously WAL-logged, force a heap FPI.
+				 */
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  PageGetLSN(page) == InvalidXLogRecPtr,
+										  vmbuffer,
+										  new_vmbits,
+										  true,
+										  InvalidTransactionId,
+										  false, PRUNE_VACUUM_SCAN,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+			}
+
+			END_CRIT_SECTION();
+
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging. */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
@@ -2892,6 +2911,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
+								  false,
 								  vmbuffer,
 								  vmflags,
 								  set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index d2ac380bb64..1fa6eb047fd 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -399,6 +399,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  bool force_heap_fpi,
 									  Buffer vmbuffer,
 									  uint8 vmflags,
 									  bool vm_modified_heap_page,
-- 
2.34.1



  [text/x-patch] v1-0010-Combine-vacuum-phase-I-VM-update-cases.patch (4.2K, 11-v1-0010-Combine-vacuum-phase-I-VM-update-cases.patch)
  download | inline diff:
From 00d6a64f3c5fa4d87e59968b636941846e6c542b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v1 10/14] Combine vacuum phase I VM update cases

After phase I of vacuum we update the VM, either setting the VM bits
when all bits are currently unset or setting just the frozen bit when
the all-visible bit is already set. Those cases had a lot of duplicated
code. Combine them. This is simpler to understand and also allows makes
the code compact enough to start using to update the VM while pruning
and freezing.
---
 src/backend/access/heap/vacuumlazy.c | 71 +++++++++-------------------
 1 file changed, 22 insertions(+), 49 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8328cab0955..fdac36f0835 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2158,11 +2158,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
-	else if (!all_visible_according_to_vm && presult.all_visible)
+
+	/*
+	 * If the page isn't yet marked all-visible in the VM or it is and needs
+	 * to me marked all-frozen, update the VM Note that all_frozen is only
+	 * valid if all_visible is true, so we must check both all_visible and
+	 * all_frozen.
+	 */
+	else if (presult.all_visible &&
+			 (!all_visible_according_to_vm ||
+			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
+		/*
+		 * If the page is all-frozen, we can pass InvalidTransactionId as our
+		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+		 * everything safe for REDO was logged when the page's tuples were
+		 * frozen.
+		 */
 		if (presult.all_frozen)
 		{
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2174,6 +2189,12 @@ lazy_scan_prune(LVRelState *vacrel,
 											  flags);
 
 		/*
+		 * Even if we are only setting the all-frozen bit, there is a small
+		 * chance that the VM was modified sometime between setting
+		 * all_visible_according_to_vm and checking the visibility during
+		 * pruning. Check the return value of old_vmbits to ensure the
+		 * visibility map counters used for logging are accurate.
+		 *
 		 * If the page wasn't already set all-visible and/or all-frozen in the
 		 * VM, count it as newly set for logging.
 		 */
@@ -2193,54 +2214,6 @@ lazy_scan_prune(LVRelState *vacrel,
 			*vm_page_frozen = true;
 		}
 	}
-
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
-											  vmbuffer, InvalidTransactionId,
-											  VISIBILITYMAP_ALL_VISIBLE |
-											  VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
 }
 
 /*
-- 
2.34.1



  [text/x-patch] v1-0012-Update-VM-in-pruneheap.c.patch (12.0K, 12-v1-0012-Update-VM-in-pruneheap.c.patch)
  download | inline diff:
From 53357a7e0f61e1ec00323c0cd14c8afb3f655b83 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v1 12/14] Update VM in pruneheap.c

As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
 src/backend/access/heap/pruneheap.c  | 99 +++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 81 +++++------------------
 src/include/access/heapam.h          | 15 +++--
 3 files changed, 106 insertions(+), 89 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 496b70e318f..425dcc77534 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -364,7 +364,8 @@ identify_and_fix_vm_corruption(Relation relation,
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -440,6 +441,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint;
+	uint8		vmflags = 0;
+	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -939,7 +942,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 *
 	 * Now that freezing has been finalized, unset all_visible if there are
 	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
+	 * of the page, as expected for updating the visibility map.
 	 */
 	if (prstate.all_visible && prstate.lpdead_items == 0)
 	{
@@ -955,31 +958,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->hastup = prstate.hastup;
 
 	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so the VM update record doesn't need it.
 	 */
 	if (presult->all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
 		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
 
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
 	/*
-	 * Clear any VM corruption. This does not need to be done in a critical
-	 * section.
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
 	 */
-	presult->vm_corruption = false;
 	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
-		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
-																blockno, buffer, page,
-																blk_known_av,
-																prstate.lpdead_items, vmbuffer);
+	{
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av,
+										   prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/*
+		 * If the page isn't yet marked all-visible in the VM or it is and
+		 * needs to me marked all-frozen, update the VM Note that all_frozen
+		 * is only valid if all_visible is true, so we must check both
+		 * all_visible and all_frozen.
+		 */
+		else if (presult->all_visible &&
+				 (!blk_known_av ||
+				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			Assert(prstate.lpdead_items == 0);
+			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			/*
+			 * If the page is all-frozen, we can pass InvalidTransactionId as
+			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+			 * make everything safe for REDO was logged when the page's tuples
+			 * were frozen.
+			 */
+			if (presult->all_frozen)
+			{
+				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			}
+
+			/*
+			 * It's possible for the VM bit to be clear and the page-level bit
+			 * to be set if checksums are not enabled.
+			 *
+			 * And even if we are just planning to update the frozen bit in
+			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
+			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+			 * might have become stale.
+			 *
+			 * If the heap page is all-visible but the VM bit is not set, we
+			 * don't need to dirty the heap page.  However, if checksums are
+			 * enabled, we do need to make sure that the heap page is dirtied
+			 * before passing it to visibilitymap_set(), because it may be
+			 * logged.
+			 */
+			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+			{
+				PageSetAllVisible(page);
+				MarkBufferDirty(buffer);
+			}
+
+			old_vmbits = heap_page_set_vm_and_log(relation, blockno, buffer,
+												  vmbuffer, presult->vm_conflict_horizon,
+												  vmflags);
+		}
+	}
+
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
+
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = vmflags;
+
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 592cd455cf4..5806207a674 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1940,7 +1940,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -1956,7 +1955,8 @@ cmpOffsetNumbers(const void *a, const void *b)
  * vmbuffer is the buffer containing the VM block with visibility information
  * for the heap block, blkno. all_visible_according_to_vm is the saved
  * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1983,6 +1983,7 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
+	 * Then, if the page's visibility status has changed, update the VM.
 	 *
 	 * If the relation has no indexes, we can immediately mark would-be dead
 	 * items LP_UNUSED.
@@ -1991,10 +1992,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * presult.ndeleted.  It should not be confused with presult.lpdead_items;
 	 * presult.lpdead_items's final value can be thought of as the number of
 	 * tuples that were deleted from indexes.
-	 *
-	 * We will update the VM after collecting LP_DEAD items and freezing
-	 * tuples. Pruning will have determined whether or not the page is
-	 * all-visible.
 	 */
 	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
@@ -2086,70 +2083,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
-	 */
-	if (presult.vm_corruption)
-	{
-		/* Don't update the VM if we just cleared corruption in it */
-	}
-
-	/*
-	 * If the page isn't yet marked all-visible in the VM or it is and needs
-	 * to me marked all-frozen, update the VM Note that all_frozen is only
-	 * valid if all_visible is true, so we must check both all_visible and
-	 * all_frozen.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	else if (presult.all_visible &&
-			 (!all_visible_according_to_vm ||
-			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		/*
-		 * If the page is all-frozen, we can pass InvalidTransactionId as our
-		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
-		 * everything safe for REDO was logged when the page's tuples were
-		 * frozen.
-		 */
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
-											  vmbuffer, presult.vm_conflict_horizon,
-											  flags);
-
-		/*
-		 * Even if we are only setting the all-frozen bit, there is a small
-		 * chance that the VM was modified sometime between setting
-		 * all_visible_according_to_vm and checking the visibility during
-		 * pruning. Check the return value of old_vmbits to ensure the
-		 * visibility map counters used for logging are accurate.
-		 *
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
+	}
 }
 
 /*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0886867a161..534a63aab31 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -234,20 +234,21 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
+	 * all_visible and all_frozen indicate the status of the page as reflected
+	 * in the visibility map after pruning, freezing, and setting any pages
+	 * all-visible in the visibility map.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * vm_conflict_horizon is the newest xmin of live tuples on the page
+	 * (older than OldestXmin).  It will only be valid if we did not set the
+	 * page all-frozen in the VM.
 	 *
 	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
-	bool		vm_corruption;
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
-- 
2.34.1



  [text/x-patch] v1-0014-Remove-xl_heap_visible-entirely.patch (19.8K, 13-v1-0014-Remove-xl_heap_visible-entirely.patch)
  download | inline diff:
From 51ce0c717152c316a29ce97edf6dfd8f720c3cba Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v1 14/14] Remove xl_heap_visible entirely

There are now no users of this, so eliminate it entirely.

ci-os-only:
---
 src/backend/access/common/bufmask.c      |   3 +-
 src/backend/access/heap/heapam.c         |  83 +------------
 src/backend/access/heap/heapam_xlog.c    | 149 +----------------------
 src/backend/access/heap/visibilitymap.c  | 101 +--------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/include/access/heapam.h              |   3 -
 src/include/access/heapam_xlog.h         |   6 -
 src/include/access/visibilitymap.h       |  10 +-
 9 files changed, 12 insertions(+), 354 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3d9b114b4e8..6f134dfd535 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
 #include "access/valid.h"
 #include "access/visibilitymap.h"
 #include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
@@ -7845,43 +7846,6 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
 	return false;
 }
 
-/*
- * Make the heap and VM page changes needed to set a page all-visible.
- * Do not call in recovery.
- */
-uint8
-heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
-						 Buffer vmbuf, TransactionId cutoff_xid,
-						 uint8 vmflags)
-{
-	Page		heap_page = BufferGetPage(heap_buf);
-	bool		set_heap_lsn = false;
-
-	Assert(BufferIsValid(heap_buf));
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferGetBlockNumber(heap_buf) != heap_blk)
-		elog(ERROR, "wrong heap buffer passed to heap_page_set_vm_and_log");
-
-	/*
-	 * We must never end up with the VM bit set and the page-level
-	 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
-	 * modification would fail to clear the VM bit. Though it is possible for
-	 * the page-level bit to be set and the VM bit to be clear if checksums
-	 * and wal_log_hints are not enabled.
-	 */
-	if (!PageIsAllVisible(heap_page))
-	{
-		PageSetAllVisible(heap_page);
-		MarkBufferDirty(heap_buf);
-		if (XLogHintBitIsNeeded())
-			set_heap_lsn = true;
-	}
-
-	return visibilitymap_set(rel, heap_blk, heap_buf,
-							 InvalidXLogRecPtr, vmbuf, cutoff_xid, vmflags, set_heap_lsn);
-}
-
 /*
  * Ensure the provided heap page is marked PD_ALL_VISIBLE and then set the
  * provided vmflags in the provided vmbuf.
@@ -7923,7 +7887,7 @@ heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
 		MarkBufferDirty(heap_buf);
 	}
 
-	return visibilitymap_set_vmbyte(rel, heap_blk, vmbuf, vmflags);
+	return visibilitymap_set(rel, heap_blk, vmbuf, vmflags);
 }
 
 /*
@@ -8865,49 +8829,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
 
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index bb6680c0467..c64fc39bc01 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -273,7 +273,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+		old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
 			PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -284,142 +284,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		UnlockReleaseBuffer(vmbuffer);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-		visibilitymap_pin(reln, blkno, &vmbuffer);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits, false);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
 
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
@@ -799,10 +663,10 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		old_vmbits = visibilitymap_set_vmbyte(reln, blkno,
-											  vmbuffer,
-											  VISIBILITYMAP_ALL_VISIBLE |
-											  VISIBILITYMAP_ALL_FROZEN);
+		old_vmbits = visibilitymap_set(reln, blkno,
+									   vmbuffer,
+									   VISIBILITYMAP_ALL_VISIBLE |
+									   VISIBILITYMAP_ALL_FROZEN);
 		Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
 		(void) old_vmbits; /* Silence compiler */
 		PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -1384,9 +1248,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 9f27ace0e1c..a24554fe191 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -219,103 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap buffer.
- * When checksums are enabled and we're not in recovery, if the heap page was
- * modified, we must add the heap buffer to the WAL chain to protect it from
- * being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags, bool set_heap_lsn)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(!(InRecovery && set_heap_lsn));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (set_heap_lsn)
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					Assert(XLogHintBitIsNeeded());
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
 
 /*
  * Set flags in the VM block contained in the passed in vmBuf.
@@ -325,8 +228,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * making any changes needed to the associated heap page.
  */
 uint8
-visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index d6c86ccac20..f7880a4ed81 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -351,13 +351,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -462,9 +455,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e35b4adf38d..c404b794fda 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -365,9 +365,6 @@ extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
 
 extern uint8 heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
 							  Buffer vmbuf, uint8 vmflags);
-extern uint8 heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
-									  Buffer vmbuf, TransactionId cutoff_xid,
-									  uint8 vmflags);
 
 extern void simple_heap_insert(Relation relation, HeapTuple tup);
 extern void simple_heap_delete(Relation relation, ItemPointer tid);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ceae9c083ff..9a61434b881 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -495,11 +494,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 5d0a9417c25..20141e3e805 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -31,14 +31,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags, bool set_heap_lsn);
-extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.34.1



  [text/x-patch] v1-0011-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch (11.7K, 14-v1-0011-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch)
  download | inline diff:
From 5266974bf5a16b05b8c8bac33f4630ed1f1552e1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v1 11/14] Find and fix VM corruption in
 heap_page_prune_and_freeze

Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
 src/backend/access/heap/pruneheap.c  | 87 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 78 +++----------------------
 src/include/access/heapam.h          |  4 ++
 3 files changed, 96 insertions(+), 73 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 97e51f78854..496b70e318f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
+
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+			heap_page_prune_and_freeze(relation, buffer, false,
+									   InvalidBuffer,
+									   vistest, 0,
 									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
  * (see heap_prune_satisfies_vacuum).
  *
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   bool blk_known_av,
+						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
 						   int options,
 						   struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	/*
+	 * Clear any VM corruption. This does not need to be done in a critical
+	 * section.
+	 */
+	presult->vm_corruption = false;
+	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+																blockno, buffer, page,
+																blk_known_av,
+																prstate.lpdead_items, vmbuffer);
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fdac36f0835..592cd455cf4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,13 +431,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-
-static bool identify_and_fix_vm_corruption(Relation relation,
-										   BlockNumber heap_blk,
-										   Buffer heap_buffer, Page heap_page,
-										   bool heap_blk_known_av,
-										   int64 nlpdead_items,
-										   Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1947,65 +1940,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
-							   BlockNumber heap_blk,
-							   Buffer heap_buffer, Page heap_page,
-							   bool heap_blk_known_av,
-							   int64 nlpdead_items,
-							   Buffer vmbuffer)
-{
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
-		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	return false;
-}
-
 
 /* qsort comparator for sorting OffsetNumbers */
 static int
@@ -2062,11 +1996,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+	heap_page_prune_and_freeze(rel, buf,
+							   all_visible_according_to_vm,
+							   vmbuffer,
+							   vacrel->vistest, prune_options,
 							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2151,10 +2088,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables. Start by looking for any VM corruption.
+	 * all_frozen variables.
 	 */
-	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
-									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	if (presult.vm_corruption)
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 1fa6eb047fd..0886867a161 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -41,6 +41,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
@@ -246,6 +247,7 @@ typedef struct PruneFreezeResult
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
+	bool		vm_corruption;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -385,6 +387,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   bool blk_known_av,
+									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
 									   int options,
 									   struct VacuumCutoffs *cutoffs,
-- 
2.34.1



  [text/x-patch] v1-0013-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch (24.7K, 15-v1-0013-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch)
  download | inline diff:
From f106462fefde3c18ae5767c879f2cc6026748938 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v1 13/14] Eliminate xl_heap_visible from vacuum phase I
 prune/freeze

Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.

This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
 src/backend/access/heap/pruneheap.c  | 384 +++++++++++++++------------
 src/backend/access/heap/vacuumlazy.c |  30 ---
 src/include/access/heapam.h          |  15 +-
 3 files changed, 223 insertions(+), 206 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 425dcc77534..2d9624a246e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		freeze;
+
+	/*
+	 * Whether or not to consider updating the VM. There is some bookkeeping
+	 * that must be maintained if we would like to update the VM.
+	 */
+	bool		update_vm;
+
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
 	 *
 	 * These fields are not used by pruning itself for the most part, but are
 	 * used to collect information about what was pruned and what state the
-	 * page is in after pruning, for the benefit of the caller.  They are
-	 * copied to the caller's PruneFreezeResult at the end.
+	 * page is in after pruning to use when updating the visibility map and
+	 * for the benefit of the caller.  They are copied to the caller's
+	 * PruneFreezeResult at the end.
 	 * -------------------------------------------------------
 	 */
 
@@ -138,11 +146,10 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+	 * directly before updating the VM. We ignore LP_DEAD items when deciding
+	 * whether or not to opportunistically freeze and when determining the
+	 * snapshot conflict horizon required when freezing tuples.
 	 *
 	 * all_frozen should only be considered valid if all_visible is also set;
 	 * we don't bother to clear the all_frozen flag every time we clear the
@@ -377,11 +384,15 @@ identify_and_fix_vm_corruption(Relation relation,
  * considered advantageous for overall system performance to do so now.  The
  * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
  * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
+ * set presult->all_visible and presult->all_frozen on exit, for use when
+ * validating the changes made to the VM. They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page.
+ *
  * blk_known_av is the visibility status of the heap block as of the last call
  * to find_next_unskippable_block(). vmbuffer is the buffer that may already
  * contain the required block of the visibility map.
@@ -396,6 +407,8 @@ identify_and_fix_vm_corruption(Relation relation,
  *   FREEZE indicates that we will also freeze tuples, and will return
  *   'all_visible', 'all_frozen' flags to the caller.
  *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -441,15 +454,19 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint;
+	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	bool		all_frozen_except_lp_dead = false;
+	bool		set_pd_all_visible = false;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
 	/*
@@ -496,29 +513,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Keep track of whether or not the page will be all-visible and
+	 * all-frozen for use in opportunistic freezing and to update the VM if
+	 * the caller requests it.
+	 *
+	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
+	 * other callers could do either one. The visibility bookkeeping is
+	 * required for opportunistic freezing (in addition to setting the VM
+	 * bits) because we only consider opportunistically freezing tuples if the
+	 * whole page would become all-frozen or if the whole page will be frozen
+	 * except for dead tuples that will be removed by vacuum.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Dead tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing, so we do not clear
+	 * all_visible when we see LP_DEAD items. We fix that after determining
+	 * whether or not to freeze but before deciding whether or not to update
+	 * the VM so that we don't set the VM bit incorrectly.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * bookkeeping.
 	 */
-	if (prstate.freeze)
+	if (prstate.freeze || prstate.update_vm)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -534,12 +549,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon.
+	 * This is most likely to happen when updating the VM and/or freezing all
+	 * live tuples on the page. It is updated before returning to the caller
+	 * because vacuum does assert-build only validation on the page using this
+	 * field.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -827,6 +845,68 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	all_frozen_except_lp_dead = prstate.all_frozen;
+	if (prstate.lpdead_items > 0)
+	{
+		prstate.all_visible = false;
+		prstate.all_frozen = false;
+	}
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
+	 */
+	if (prstate.update_vm)
+	{
+		/*
+		 * Clear any VM corruption. This does not need to be in a critical
+		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+		 * we may mark the heap page buffer dirty here and could end up doing
+		 * so again later. This is not a correctness issue and is in the path
+		 * of VM corruption, so we don't have to worry about the extra
+		 * performance overhead.
+		 */
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av, prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/* Determine if we actually need to set the VM and which bits to set. */
+		else if (prstate.all_visible &&
+				 (!blk_known_av ||
+				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+			if (prstate.all_frozen)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+	}
+
+	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+	/* Save these for the caller in case we later zero out vmflags */
+	presult->new_vmbits = vmflags;
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -848,13 +928,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
 		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * the buffer dirty and emit WAL below.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_prune && !do_freeze && !do_set_vm)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -868,7 +948,23 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (do_prune || do_freeze)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			old_vmbits = heap_page_set_vm(relation, blockno, buffer,
+										  vmbuffer, vmflags);
+
+			if (old_vmbits == vmflags)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				do_set_vm = false;
+				/* 0 out vmflags so we don't emit VM update WAL */
+				vmflags = 0;
+			}
+		}
 
 		/*
 		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
@@ -885,35 +981,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
-			TransactionId conflict_xid;
+			TransactionId conflict_xid = InvalidTransactionId;
+
+			/*
+			 * If we are updating the VM, the conflict horizon is almost
+			 * always the visibility cutoff XID.
+			 *
+			 * Separately, if we are freezing any tuples, as an optimization,
+			 * we can use the visibility_cutoff_xid as the conflict horizon if
+			 * the page will be all-frozen. This is true even if there are
+			 * LP_DEAD line pointers because we ignored those when maintaining
+			 * the visibility_cutoff_xid.
+			 */
+			if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+				conflict_xid = prstate.visibility_cutoff_xid;
 
 			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
+			 * Otherwise, if we are freezing but the page would not be
+			 * all-frozen, we have to use the more pessimistic horizon of
+			 * OldestXmin, which may be newer than the newest tuple we froze.
+			 * That's because we won't have maintained the
+			 * visibility_cutoff_xid.
 			 */
-			if (do_freeze)
+			else if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
+				conflict_xid = prstate.cutoffs->OldestXmin;
+				TransactionIdRetreat(conflict_xid);
 			}
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
+			/*
+			 * If we are removing tuples with a younger xmax than our so far
+			 * calculated conflict_xid, we must use this as our horizon.
+			 */
+			if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
 				conflict_xid = prstate.latest_xid_removed;
 
+			/*
+			 * We can omit the snapshot conflict horizon if we are not pruning
+			 * or freezing any tuples and are setting an already all-visible
+			 * page all-frozen in the VM. In this case, all of the tuples on
+			 * the page must already be visible to all MVCC snapshots on the
+			 * standby.
+			 */
+			if (!do_prune && !do_freeze && do_set_vm &&
+				blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+				conflict_xid = InvalidTransactionId;
+
 			log_heap_prune_and_freeze(relation, buffer,
 									  false,
-									  InvalidBuffer, 0, false,
+									  vmbuffer,
+									  vmflags,
+									  set_pd_all_visible,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -925,124 +1043,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected for updating the visibility map.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
-	presult->hastup = prstate.hastup;
-
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so the VM update record doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * VACUUM will call heap_page_is_all_visible() during the second pass over
+	 * the heap to determine all_visible and all_frozen for the page -- this
+	 * is a specialized version of the logic from this function.  Now that
+	 * we've finished pruning and freezing, make sure that we're in total
+	 * agreement with heap_page_is_all_visible() using an assertion. We will
+	 * have already set the page in the VM, so this assertion will only let
+	 * you know that you've already done something wrong.
 	 */
-	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
 	{
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av,
-										   prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		/*
-		 * If the page isn't yet marked all-visible in the VM or it is and
-		 * needs to me marked all-frozen, update the VM Note that all_frozen
-		 * is only valid if all_visible is true, so we must check both
-		 * all_visible and all_frozen.
-		 */
-		else if (presult->all_visible &&
-				 (!blk_known_av ||
-				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			Assert(prstate.lpdead_items == 0);
-			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		Assert(cutoffs);
 
-			/*
-			 * If the page is all-frozen, we can pass InvalidTransactionId as
-			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
-			 * make everything safe for REDO was logged when the page's tuples
-			 * were frozen.
-			 */
-			if (presult->all_frozen)
-			{
-				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			}
+		Assert(prstate.lpdead_items == 0);
 
-			/*
-			 * It's possible for the VM bit to be clear and the page-level bit
-			 * to be set if checksums are not enabled.
-			 *
-			 * And even if we are just planning to update the frozen bit in
-			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
-			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
-			 * might have become stale.
-			 *
-			 * If the heap page is all-visible but the VM bit is not set, we
-			 * don't need to dirty the heap page.  However, if checksums are
-			 * enabled, we do need to make sure that the heap page is dirtied
-			 * before passing it to visibilitymap_set(), because it may be
-			 * logged.
-			 */
-			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buffer);
-			}
+		if (!heap_page_is_all_visible(relation, buffer,
+									  cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
 
-			old_vmbits = heap_page_set_vm_and_log(relation, blockno, buffer,
-												  vmbuffer, presult->vm_conflict_horizon,
-												  vmflags);
-		}
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
+#endif
 
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->old_vmbits = old_vmbits;
+	/* new_vmbits was set above */
+	presult->hastup = prstate.hastup;
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = vmflags;
-
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
@@ -1624,8 +1673,13 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			break;
 	}
 
-	/* Consider freezing any normal tuples which will not be removed */
-	if (prstate->freeze)
+	/*
+	 * Consider freezing any normal tuples which will not be removed.
+	 * Regardless of whether or not we want to freeze the tuples, if we want
+	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+	 * tuple to know whether or not the page will be totally frozen.
+	 */
+	if (prstate->freeze || prstate->update_vm)
 	{
 		bool		totally_frozen;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5806207a674..7d74f8fc0f1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2019,34 +2019,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2080,8 +2052,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 534a63aab31..e35b4adf38d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -234,19 +234,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate the status of the page as reflected
-	 * in the visibility map after pruning, freezing, and setting any pages
-	 * all-visible in the visibility map.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page
-	 * (older than OldestXmin).  It will only be valid if we did not set the
-	 * page all-frozen in the VM.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
 	uint8		old_vmbits;
 	uint8		new_vmbits;
 
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-06-26 22:04  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-06-26 22:04 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>

On Mon, Jun 23, 2025 at 4:25 PM Melanie Plageman
<[email protected]> wrote:
>
> The attached patch set eliminates xl_heap_visible, the WAL record
> emitted when a block of the heap is set all-visible/frozen in the
> visibility map. Instead, it includes the information needed to update
> the VM in the WAL record already emitted by the operation modifying
> the heap page.

Rebased in light of recent changes on master:

0001: cleanup
0002: preparatory work
0003: eliminate xl_heap_visible for COPY FREEZE
0004 - 0005: eliminate xl_heap_visible for vacuum's phase III
0006: eliminate xl_heap_visible for vacuum phase I empty pages
0007 - 0010: preparatory refactoring
0011: eliminate xl_heap_visible from vacuum phase I prune/freeze
0012: remove xl_heap_visible

- Melanie


Attachments:

  [text/x-patch] v2-0002-Introduce-unlogged-versions-of-VM-functions.patch (5.9K, 2-v2-0002-Introduce-unlogged-versions-of-VM-functions.patch)
  download | inline diff:
From 3b9cbaac3b40976ef04ead3e2500f24d8938bda8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v2 02/12] Introduce unlogged versions of VM functions

Future commits will eliminate usages of xl_heap_visible and incorporate
setting the VM into the WAL records making other changes to the heap
page. As a step toward this make versions of the functions which update
the VM and its heap-specific wrapper which do not emit their own WAL.

These will be used in follow-on commits.
---
 src/backend/access/heap/heapam.c        | 44 ++++++++++++++++++++++++
 src/backend/access/heap/visibilitymap.c | 45 +++++++++++++++++++++++++
 src/include/access/heapam.h             |  3 ++
 src/include/access/visibilitymap.h      |  2 ++
 4 files changed, 94 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 112f946dab0..d125787fcb6 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7898,6 +7898,50 @@ heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
 	return old_vmbits;
 }
 
+/*
+ * Ensure the provided heap page is marked PD_ALL_VISIBLE and then set the
+ * provided vmflags in the provided vmbuf.
+ *
+ * Both the heap page and VM page should be pinned and exclusive locked.
+ * You must pass a VM buffer containing the correct page of the map
+ * corresponding to the passed in heap block.
+ *
+ * This should only be called in a critical section that also emits WAL (as
+ * needed) for both heap page changes and VM page changes.
+ */
+uint8
+heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+				 Buffer vmbuf, uint8 vmflags)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+
+	Assert(BufferIsValid(heap_buf));
+	Assert(CritSectionCount > 0);
+
+	/* Check that we have the right heap page pinned */
+	if (BufferGetBlockNumber(heap_buf) != heap_blk)
+		elog(ERROR, "wrong heap buffer passed to heap_page_set_vm");
+
+	/*
+	 * We must never end up with the VM bit set and the page-level
+	 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+	 * modification would fail to clear the VM bit.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM. Since
+	 * Postgres 19, since heap page modifications are done in the same
+	 * critical section as setting the VM bits, that should not longer happen.
+	 */
+	if (!PageIsAllVisible(heap_page))
+	{
+		PageSetAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+	}
+
+	return visibilitymap_set_vmbyte(rel, heap_blk, vmbuf, vmflags);
+}
+
 /*
  * heap_tuple_should_freeze
  *
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index c57632168c7..cabd0fa0880 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -300,6 +300,51 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf.
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page.
+ */
+uint8
+visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9375296062f..5127fdb9c77 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -360,6 +360,9 @@ extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
 									 TransactionId *NoFreezePageRelfrozenXid,
 									 MultiXactId *NoFreezePageRelminMxid);
 extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
+
+extern uint8 heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+							  Buffer vmbuf, uint8 vmflags);
 extern uint8 heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
 									  Buffer vmbuf, TransactionId cutoff_xid,
 									  uint8 vmflags);
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 4c7472e0b51..91ef3705e84 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.34.1



  [text/x-patch] v2-0001-Introduce-heap-specific-wrapper-for-visibilitymap.patch (16.0K, 3-v2-0001-Introduce-heap-specific-wrapper-for-visibilitymap.patch)
  download | inline diff:
From 44370f480a1da1c51640faa5098ef127be7f3092 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 26 Jun 2025 15:57:53 -0400
Subject: [PATCH v2 01/12] Introduce heap-specific wrapper for
 visibilitymap_set()

visibilitymap_set(), which sets bits in the visibility map corresponding
to the heap block of the table passed in, arguably breaks a few of the
coding rules for modifying and WAL logging buffers set out in
access/transam/README.

In several of the places where visibilitymap_set() is called, setting
the heap page PD_ALL_VISIBLE and marking the buffer dirty are done
outside of a critical section.

In some places before visibilitymap_set() is called, MarkBufferDirty()
is used when MarkBufferDirtyHint() would be appropriate.

And in some places where PD_ALL_VISIBLE may already be set and we don't
mark the buffer dirty, when checksums/wal_log_hints are enabled
visibilitymap_set() will still set the heap page LSN -- even though it
was correct not to set the buffer dirty.

Besides all of these issues, having these operations open-coded all over
the place is error-prone. This commit introduces a wrapper that does the
correct operations to the heap page itself and invokes
visibilitymap_set() to make changes to the VM page.
---
 src/backend/access/heap/heapam.c        | 92 ++++++++++++++++++++-----
 src/backend/access/heap/heapam_xlog.c   |  2 +-
 src/backend/access/heap/vacuumlazy.c    | 66 +++++-------------
 src/backend/access/heap/visibilitymap.c | 58 ++++++----------
 src/include/access/heapam.h             |  3 +
 src/include/access/visibilitymap.h      |  2 +-
 6 files changed, 117 insertions(+), 106 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..112f946dab0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2505,8 +2505,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
-		else if (all_frozen_set)
-			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2632,23 +2630,16 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		/*
 		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
+		 * We're already holding pin on the vmbuffer. It's fine to use
+		 * InvalidTransactionId as the cutoff_xid here - this is only used
+		 * when HEAP_INSERT_FROZEN is specified, which intentionally violates
+		 * visibility rules.
 		 */
 		if (all_frozen_set)
-		{
-			Assert(PageIsAllVisible(page));
-			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			heap_page_set_vm_and_log(relation, BufferGetBlockNumber(buffer), buffer,
+									 vmbuffer,
+									 InvalidTransactionId,
+									 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
@@ -7840,6 +7831,73 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
 	return false;
 }
 
+/*
+ * Make the heap and VM page changes needed to set a page all-visible.
+ * Do not call in recovery.
+ */
+uint8
+heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+						 Buffer vmbuf, TransactionId cutoff_xid,
+						 uint8 vmflags)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+	bool		set_heap_lsn = false;
+	XLogRecPtr	recptr = InvalidXLogRecPtr;
+	uint8		old_vmbits = 0;
+
+	Assert(BufferIsValid(heap_buf));
+
+	START_CRIT_SECTION();
+
+	/* Check that we have the right heap page pinned, if present */
+	if (BufferGetBlockNumber(heap_buf) != heap_blk)
+		elog(ERROR, "wrong heap buffer passed to heap_page_set_vm_and_log");
+
+	/*
+	 * We must never end up with the VM bit set and the page-level
+	 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+	 * modification would fail to clear the VM bit. Though it is possible for
+	 * the page-level bit to be set and the VM bit to be clear if checksums
+	 * and wal_log_hints are not enabled.
+	 */
+	if (!PageIsAllVisible(heap_page))
+	{
+		PageSetAllVisible(heap_page);
+
+		/*
+		 * Buffer will usually be dirty from other changes, so it is worth the
+		 * extra check
+		 */
+		if (!BufferIsDirty(heap_buf))
+		{
+			if (XLogHintBitIsNeeded())
+				MarkBufferDirty(heap_buf);
+			else
+				MarkBufferDirtyHint(heap_buf, true);
+		}
+
+		set_heap_lsn = XLogHintBitIsNeeded();
+	}
+
+	old_vmbits = visibilitymap_set(rel, heap_blk, heap_buf,
+								   &recptr, vmbuf, cutoff_xid, vmflags);
+
+	/*
+	 * If we modified the heap page and data checksums are enabled (or
+	 * wal_log_hints=on), we need to protect the heap page from being torn.
+	 *
+	 * If not, then we must *not* update the heap page's LSN. In this case,
+	 * the FPI for the heap page was omitted from the WAL record inserted in
+	 * the VM record, so it would be incorrect to update the heap page's LSN.
+	 */
+	if (set_heap_lsn)
+		PageSetLSN(heap_page, recptr);
+
+	END_CRIT_SECTION();
+
+	return old_vmbits;
+}
+
 /*
  * heap_tuple_should_freeze
  *
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index eb4bd3d6ae3..f2bc1bd06ee 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -297,7 +297,7 @@ heap_xlog_visible(XLogReaderState *record)
 		reln = CreateFakeRelcacheEntry(rlocator);
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
+		visibilitymap_set(reln, blkno, InvalidBuffer, &lsn, vmbuffer,
 						  xlrec->snapshotConflictHorizon, vmbits);
 
 		ReleaseBuffer(vmbuffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a42e17aec2..c0608af7d29 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1874,9 +1874,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		{
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
-			MarkBufferDirty(buf);
-
 			/*
 			 * It's possible that another backend has extended the heap,
 			 * initialized the page, and then failed to WAL-log the page due
@@ -1888,14 +1885,15 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 			 */
 			if (RelationNeedsWAL(vacrel->rel) &&
 				PageGetLSN(page) == InvalidXLogRecPtr)
+			{
+				MarkBufferDirty(buf);
 				log_newpage_buffer(buf, true);
+			}
 
-			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+									 vmbuffer, InvalidTransactionId,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
 			/* Count the newly all-frozen pages for logging */
@@ -2069,25 +2067,9 @@ lazy_scan_prune(LVRelState *vacrel,
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
+		old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+											  vmbuffer, presult.vm_conflict_horizon,
+											  flags);
 
 		/*
 		 * If the page wasn't already set all-visible and/or all-frozen in the
@@ -2159,17 +2141,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		uint8		old_vmbits;
 
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
 		/*
 		 * Set the page all-frozen (and all-visible) in the VM.
 		 *
@@ -2178,11 +2149,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * was logged when the page's tuples were frozen.
 		 */
 		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
+		old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+											  vmbuffer, InvalidTransactionId,
+											  VISIBILITYMAP_ALL_VISIBLE |
+											  VISIBILITYMAP_ALL_FROZEN);
 
 		/*
 		 * The page was likely already set all-visible in the VM. However,
@@ -2913,11 +2883,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
-		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  flags);
+		heap_page_set_vm_and_log(vacrel->rel, blkno, buffer,
+								 vmbuffer, visibility_cutoff_xid,
+								 flags);
 
 		/* Count the newly set VM page for logging */
 		vacrel->vm_new_visible_pages++;
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 745a04ef26e..c57632168c7 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -222,29 +222,31 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 /*
  *	visibilitymap_set - set bit(s) on a previously pinned page
  *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
  * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
+ * this function. Except in recovery, caller should also pass the heap buffer.
+ * When checksums are enabled and we're not in recovery, we must add the heap
+ * buffer to the WAL chain to protect it from being torn.
  *
  * You must pass a buffer containing the correct map page to this function.
  * Call visibilitymap_pin first to pin the right one. This function doesn't do
  * any I/O.
  *
- * Returns the state of the page's VM bits before setting flags.
+ * cutoff_xid is the largest xmin on the page being marked all-visible; it is
+ * needed for Hot Standby, and can be InvalidTransactionId if the page
+ * contains no tuples.  It can also be set to InvalidTransactionId when a page
+ * that is already all-visible is being marked all-frozen.
+ *
+ * If we're in recovery, recptr points to the LSN of the XLOG record we're
+ * replaying and the VM page LSN is advanced to this LSN. During normal
+ * running, we'll generate a new XLOG record for the changes to the VM and set
+ * the VM page LSN. We will return this LSN in recptr, and the caller may use
+ * this to set the heap page LSN.
+ *
+ * Returns the state of the page's VM bits before setting flags and sets.
  */
 uint8
 visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
+				  XLogRecPtr *recptr, Buffer vmBuf, TransactionId cutoff_xid,
 				  uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
@@ -258,17 +260,13 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
 #endif
 
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
+	Assert(InRecovery || XLogRecPtrIsInvalid(*recptr));
 	Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
 	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
 
 	/* Must never set all_frozen bit without also setting all_visible bit */
 	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
 
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
 	/* Check that we have the right VM page pinned */
 	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
 		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
@@ -287,28 +285,12 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 
 		if (RelationNeedsWAL(rel))
 		{
-			if (XLogRecPtrIsInvalid(recptr))
+			if (XLogRecPtrIsInvalid(*recptr))
 			{
 				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
+				*recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
 			}
-			PageSetLSN(page, recptr);
+			PageSetLSN(page, *recptr);
 		}
 
 		END_CRIT_SECTION();
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3a9424c19c9..9375296062f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -360,6 +360,9 @@ extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
 									 TransactionId *NoFreezePageRelfrozenXid,
 									 MultiXactId *NoFreezePageRelminMxid);
 extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
+extern uint8 heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+									  Buffer vmbuf, TransactionId cutoff_xid,
+									  uint8 vmflags);
 
 extern void simple_heap_insert(Relation relation, HeapTuple tup);
 extern void simple_heap_delete(Relation relation, ItemPointer tid);
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..4c7472e0b51 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -33,7 +33,7 @@ extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
 extern uint8 visibilitymap_set(Relation rel,
 							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
+							   XLogRecPtr *recptr,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
-- 
2.34.1



  [text/x-patch] v2-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patch (5.1K, 4-v2-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patch)
  download | inline diff:
From b927fb837d0d0897620e2a805b0a8d517522a0bc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v2 04/12] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need two parameters from the
LVRelState, so just pass those in explicitly.
---
 src/backend/access/heap/vacuumlazy.c | 45 ++++++++++++++++++----------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c0608af7d29..e620f0a635b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,8 +464,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2005,8 +2008,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2872,8 +2876,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * emitted.
 	 */
 	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
@@ -3555,9 +3559,16 @@ dead_items_cleanup(LVRelState *vacrel)
 
 /*
  * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * Return the visibility_cutoff_xid which is the highest xmin amongst the
+ * visible tuples. Sets *all_frozen to true if every tuple on this page is
+ * frozen.
  *
  * This is a stripped down version of lazy_scan_prune().  If you change
  * anything here, make sure that everything stays in sync.  Note that an
@@ -3565,9 +3576,11 @@ dead_items_cleanup(LVRelState *vacrel)
  * introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3590,7 +3603,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3614,9 +3627,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
 										 buf))
 		{
 			case HEAPTUPLE_LIVE:
@@ -3637,7 +3650,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+											   OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3672,7 +3685,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
-- 
2.34.1



  [text/x-patch] v2-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch (24.7K, 5-v2-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch)
  download | inline diff:
From c28b2edbd682e22546d4bce080728b2ef8a35601 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v2 05/12] Eliminate xl_heap_visible from vacuum phase III

Instead of emitting a separate xl_heap_visible record for each page that
is all-visible after vacuum's third phase, use the VM-related options
when emitting the xl_heap_prune record with the changes vacuum makes in
phase III.
---
 src/backend/access/heap/heapam_xlog.c  | 148 ++++++++++++++++++++++---
 src/backend/access/heap/pruneheap.c    |  48 +++++++-
 src/backend/access/heap/vacuumlazy.c   | 146 ++++++++++++++++--------
 src/backend/access/rmgrdesc/heapdesc.c |  13 ++-
 src/include/access/heapam.h            |   9 ++
 src/include/access/heapam_xlog.h       |   3 +
 6 files changed, 301 insertions(+), 66 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 71754fd77c4..70a46a37357 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. If pruning, that
+	 * means we cannot remove tuples still visible to transactions on the
+	 * standby. If freezing, that means we cannot freeze tuples with xids that
+	 * are still considered running on the standby. And for setting the VM, we
+	 * cannot do so if the page isn't all-visible to all transactions on the
+	 * standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -70,13 +76,28 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 												rlocator);
 	}
 
+	/* Next are the optionally included vmflags. Copy them out for later use. */
+	if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+	{
+		memcpy(&vmflags, maindataptr, sizeof(uint8));
+		maindataptr += sizeof(uint8);
+
+		/*
+		 * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
+		 * because we already have XLHP_IS_CATALOG_REL.
+		 */
+		Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+		/* Must never set all_frozen bit without also setting all_visible bit */
+		Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
+	}
+
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = (Page) BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -89,6 +110,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Size		datalen;
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
+		bool		do_prune;
+		bool		mark_buffer_dirty;
+		bool		set_heap_lsn;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +121,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+		set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 ||
+			   vmflags & VISIBILITYMAP_VALID_BITS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,26 +169,78 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		Assert(BufferIsValid(buffer) &&
+			   BufferGetBlockNumber(buffer) == blkno);
+
+		/*
+		 * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+		 * also going to set bits in the VM later.
+		 *
+		 * We must never end up with the VM bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+		 * modification would fail to clear the VM bit.
+		 *
+		 * Prior to Postgres 19, it was possible for the page-level bit to be
+		 * set and the VM bit to be clear. This could happen if we crashed
+		 * after setting PD_ALL_VISIBLE but before setting bits in the VM.
+		 */
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+
+			/*
+			 * Setting PD_ALL_VISIBLE only forces us to update the heap page
+			 * LSN if checksums or wal_log_hints are enabled (in which case we
+			 * must). This exposes us to torn page hazards, but since we're
+			 * not inspecting the existing page contents in any way, we don't
+			 * care.
+			 */
+			set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+			mark_buffer_dirty = true;
+		}
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
 
-		PageSetLSN(page, lsn);
-		MarkBufferDirty(buffer);
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+		if (set_heap_lsn)
+			PageSetLSN(page, lsn);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or will be setting a page in
+	 * the visibility map, update the free space map.
+	 *
+	 * Even if we are just updating the VM (and thus not freeing up any
+	 * space), we'll still update the FSM for this page. Since FSM is not
+	 * WAL-logged and only updated heuristically, it easily becomes stale in
+	 * standbys.  If the standby is later promoted and runs VACUUM, it will
+	 * skip updating individual free space figures for pages that became
+	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
+	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+	 * space values to upper FSM layers; later inserters try to use such pages
+	 * only to find out that they are unusable.  This can cause long stalls
+	 * when there are many such pages.
+	 *
+	 * Forestall those problems by updating FSM's idea about a page that is
+	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
+	 *
+	 * We want to avoid holding an exclusive lock on the heap buffer while
+	 * doing IO (either of the FSM or the VM), so we'll release the lock on
+	 * the heap buffer before doing either.
 	 */
 	if (BufferIsValid(buffer))
 	{
-		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
-						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
+		if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+							XLHP_HAS_DEAD_ITEMS |
+							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+			vmflags & VISIBILITYMAP_VALID_BITS)
 		{
 			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
 
@@ -168,6 +251,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		else
 			UnlockReleaseBuffer(buffer);
 	}
+
+	/*
+	 * Read and update the VM block. Even if we skipped updating the heap page
+	 * due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * Note that it is *only* okay that we do not hold a lock on the heap page
+	 * because we are in recovery and can expect no other writers to clear
+	 * PD_ALL_VISIBLE before we are able to update the VM.
+	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		uint8		old_vmbits = 0;
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		visibilitymap_pin(reln, blkno, &vmbuffer);
+		old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a8025889be0..d9ba0f96e34 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -2045,12 +2047,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer,
+						  uint8 vmflags,
+						  bool set_pd_all_vis,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
 						  PruneReason reason,
@@ -2062,6 +2075,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2084,19 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
 	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2136,6 +2165,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Prepare the main xl_heap_prune record.  We already set the XLHP_HAS_*
 	 * flag above.
 	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		xlrec.flags |= XLHP_HAS_VMFLAGS;
 	if (RelationIsAccessibleInLogicalDecoding(relation))
 		xlrec.flags |= XLHP_IS_CATALOG_REL;
 	if (TransactionIdIsValid(conflict_xid))
@@ -2150,6 +2181,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	XLogRegisterData(&xlrec, SizeOfHeapPrune);
 	if (TransactionIdIsValid(conflict_xid))
 		XLogRegisterData(&conflict_xid, sizeof(TransactionId));
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterData(&vmflags, sizeof(uint8));
 
 	switch (reason)
 	{
@@ -2168,5 +2201,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+
+	/*
+	 * If pruning or freezing tuples or setting the page all-visible when
+	 * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+	 * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+	 * but this is deemed okay for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+		PageSetLSN(BufferGetPage(buffer), recptr);
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e620f0a635b..56acb224d71 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,11 +464,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
+static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+												   TransactionId OldestXmin,
+												   OffsetNumber *deadoffsets,
+												   int allowed_num_offsets,
+												   bool *all_frozen,
+												   TransactionId *visibility_cutoff_xid,
+												   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2812,8 +2814,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
+	bool		set_pd_all_vis = false;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2824,6 +2829,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
+											   vacrel->cutoffs.OldestXmin,
+											   deadoffsets, num_offsets,
+											   &all_frozen, &visibility_cutoff_xid,
+											   &vacrel->offnum))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2843,6 +2862,17 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
+		Assert(!PageIsAllVisible(page));
+		set_pd_all_vis = true;
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		heap_page_set_vm(vacrel->rel,
+						 blkno, buffer,
+						 vmbuffer, vmflags);
+		conflict_xid = visibility_cutoff_xid;
+	}
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2852,7 +2882,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  vmbuffer,
+								  vmflags,
+								  set_pd_all_vis,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
@@ -2861,37 +2894,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
-	 */
 	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
-	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
-								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		heap_page_set_vm_and_log(vacrel->rel, blkno, buffer,
-								 vmbuffer, visibility_cutoff_xid,
-								 flags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
@@ -3557,6 +3565,25 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
+/*
+ * Wrapper for heap_page_is_all_visible_except_lpdead() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+												  NULL, 0,
+												  all_frozen,
+												  visibility_cutoff_xid,
+												  logging_offnum);
+}
+
 /*
  * Check if every tuple in the given page is visible to all current and future
  * transactions.
@@ -3570,23 +3597,35 @@ dead_items_cleanup(LVRelState *vacrel)
  * visible tuples. Sets *all_frozen to true if every tuple on this page is
  * frozen.
  *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * deadoffsets are the offsets we know about and are about to set LP_UNUSED.
+ * allowed_num_offsets is the number of those. As long as the LP_DEAD items we
+ * encounter on the page match those exactly, we can set the page all-visible
+ * in the VM.
+ *
+ * Callers looking to verify that the page is all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
+heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+									   TransactionId OldestXmin,
+									   OffsetNumber *deadoffsets,
+									   int allowed_num_offsets,
+									   bool *all_frozen,
+									   TransactionId *visibility_cutoff_xid,
+									   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	OffsetNumber current_dead_offsets[MaxHeapTuplesPerPage];
+	size_t		current_num_offsets = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
@@ -3618,9 +3657,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			current_dead_offsets[current_num_offsets++] = offnum;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
@@ -3687,7 +3725,23 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
-	return all_visible;
+	/* If we already know it's not all-visible, return false */
+	if (!all_visible)
+		return false;
+
+	/* If we weren't allowed any dead offsets, we're done */
+	if (allowed_num_offsets == 0)
+		return current_num_offsets == 0;
+
+	/* If the number of dead offsets has changed, that's wrong */
+	if (current_num_offsets != allowed_num_offsets)
+		return false;
+
+	Assert(deadoffsets);
+
+	/* The dead offsets must be the same dead offsets */
+	return memcmp(current_dead_offsets, deadoffsets,
+				  allowed_num_offsets * sizeof(OffsetNumber)) == 0;
 }
 
 /*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..d6c86ccac20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -266,6 +266,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 {
 	char	   *rec = XLogRecGetData(record);
 	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+	char	   *maindataptr = rec + SizeOfHeapPrune;
 
 	info &= XLOG_HEAP_OPMASK;
 	if (info == XLOG_HEAP2_PRUNE_ON_ACCESS ||
@@ -278,7 +279,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		{
 			TransactionId conflict_xid;
 
-			memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
+			memcpy(&conflict_xid, maindataptr, sizeof(TransactionId));
+			maindataptr += sizeof(TransactionId);
 
 			appendStringInfo(buf, "snapshotConflictHorizon: %u",
 							 conflict_xid);
@@ -287,6 +289,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & XLHP_HAS_VMFLAGS)
+		{
+			uint8		vmflags;
+
+			memcpy(&vmflags, maindataptr, sizeof(uint8));
+			maindataptr += sizeof(uint8);
+			appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+		}
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 5127fdb9c77..d2ac380bb64 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -343,6 +343,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
 										   Buffer buffer);
 extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
@@ -393,6 +399,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer,
+									  uint8 vmflags,
+									  bool vm_modified_heap_page,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
 									  PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..ceae9c083ff 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -295,6 +295,9 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
 
+/* If the record should update the VM, this is the new value */
+#define		XLHP_HAS_VMFLAGS			(1 << 0)
+
 /* to handle recovery conflict during logical decoding on standby */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
-- 
2.34.1



  [text/x-patch] v2-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch (6.7K, 6-v2-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch)
  download | inline diff:
From b0e549ab6d941e04dd8a1380523aad249e7fdde9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:40:28 -0400
Subject: [PATCH v2 03/12] Eliminate xl_heap_visible in COPY FREEZE

Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, include the required update in the xl_heap_multi_insert
record instead.
---
 src/backend/access/heap/heapam.c       | 42 +++++++++++++++++---------
 src/backend/access/heap/heapam_xlog.c  | 33 +++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c |  5 +++
 3 files changed, 65 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d125787fcb6..d2cf8aa9fb8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2493,9 +2493,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * If the page is all visible, need to clear that, unless we're only
 		 * going to add further frozen rows to it.
-		 *
-		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2506,6 +2503,22 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
 
+		/*
+		 * If we're only adding already frozen rows to a previously empty
+		 * page, mark it as all-visible. And if we've frozen everything on the
+		 * page, update the visibility map. We're already holding a pin on the
+		 * vmbuffer.
+		 */
+		else if (all_frozen_set)
+		{
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			heap_page_set_vm(relation,
+							 BufferGetBlockNumber(buffer), buffer,
+							 vmbuffer,
+							 VISIBILITYMAP_ALL_VISIBLE |
+							 VISIBILITYMAP_ALL_FROZEN);
+		}
+
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
 		 */
@@ -2552,6 +2565,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2614,7 +2633,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2624,22 +2646,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer. It's fine to use
-		 * InvalidTransactionId as the cutoff_xid here - this is only used
-		 * when HEAP_INSERT_FROZEN is specified, which intentionally violates
-		 * visibility rules.
-		 */
 		if (all_frozen_set)
-			heap_page_set_vm_and_log(relation, BufferGetBlockNumber(buffer), buffer,
-									 vmbuffer,
-									 InvalidTransactionId,
-									 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index f2bc1bd06ee..71754fd77c4 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -552,6 +552,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -572,11 +573,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -663,6 +664,36 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Now read and update the VM block. Even if we skipped updating the heap
+	 * page due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * It is only okay to set the VM bits without holding the heap page lock
+	 * because we can expect no other writers of this page.
+	 */
+	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		visibilitymap_pin(reln, blkno, &vmbuffer);
+		visibilitymap_set_vmbyte(reln, blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN);
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
-- 
2.34.1



  [text/x-patch] v2-0008-Combine-vacuum-phase-I-VM-update-cases.patch (4.2K, 7-v2-0008-Combine-vacuum-phase-I-VM-update-cases.patch)
  download | inline diff:
From 8716ac80b1b9b840a34ebcc1012565ca0375e045 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v2 08/12] Combine vacuum phase I VM update cases

After phase I of vacuum we update the VM, either setting the VM bits
when all bits are currently unset or setting just the frozen bit when
the all-visible bit is already set. Those cases had a lot of duplicated
code. Combine them. This is simpler to understand and also allows makes
the code compact enough to start using to update the VM while pruning
and freezing.
---
 src/backend/access/heap/vacuumlazy.c | 71 +++++++++-------------------
 1 file changed, 22 insertions(+), 49 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0c5f8484866..402b2bd65ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2151,11 +2151,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
-	else if (!all_visible_according_to_vm && presult.all_visible)
+
+	/*
+	 * If the page isn't yet marked all-visible in the VM or it is and needs
+	 * to me marked all-frozen, update the VM Note that all_frozen is only
+	 * valid if all_visible is true, so we must check both all_visible and
+	 * all_frozen.
+	 */
+	else if (presult.all_visible &&
+			 (!all_visible_according_to_vm ||
+			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
+		/*
+		 * If the page is all-frozen, we can pass InvalidTransactionId as our
+		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+		 * everything safe for REDO was logged when the page's tuples were
+		 * frozen.
+		 */
 		if (presult.all_frozen)
 		{
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2167,6 +2182,12 @@ lazy_scan_prune(LVRelState *vacrel,
 											  flags);
 
 		/*
+		 * Even if we are only setting the all-frozen bit, there is a small
+		 * chance that the VM was modified sometime between setting
+		 * all_visible_according_to_vm and checking the visibility during
+		 * pruning. Check the return value of old_vmbits to ensure the
+		 * visibility map counters used for logging are accurate.
+		 *
 		 * If the page wasn't already set all-visible and/or all-frozen in the
 		 * VM, count it as newly set for logging.
 		 */
@@ -2186,54 +2207,6 @@ lazy_scan_prune(LVRelState *vacrel,
 			*vm_page_frozen = true;
 		}
 	}
-
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
-											  vmbuffer, InvalidTransactionId,
-											  VISIBILITYMAP_ALL_VISIBLE |
-											  VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
 }
 
 /*
-- 
2.34.1



  [text/x-patch] v2-0007-Combine-lazy_scan_prune-VM-corruption-cases.patch (7.1K, 8-v2-0007-Combine-lazy_scan_prune-VM-corruption-cases.patch)
  download | inline diff:
From f1a47d3e3ef4822689acedf1eea5557aa8fdd850 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v2 07/12] Combine lazy_scan_prune VM corruption cases

lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.

Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.

This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
 src/backend/access/heap/vacuumlazy.c | 115 +++++++++++++++++----------
 1 file changed, 74 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 68c8b0f4475..0c5f8484866 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,6 +431,13 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1933,6 +1940,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
+
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2077,9 +2144,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * all_frozen variables. Start by looking for any VM corruption.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	{
+		/* Don't update the VM if we just cleared corruption in it */
+	}
+	else if (!all_visible_according_to_vm && presult.all_visible)
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2115,45 +2187,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-- 
2.34.1



  [text/x-patch] v2-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch (11.7K, 9-v2-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch)
  download | inline diff:
From 6e021c54db3f723c814a73d431a30995d9256655 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v2 09/12] Find and fix VM corruption in
 heap_page_prune_and_freeze

Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
 src/backend/access/heap/pruneheap.c  | 87 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 78 +++----------------------
 src/include/access/heapam.h          |  4 ++
 3 files changed, 96 insertions(+), 73 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 97e51f78854..496b70e318f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
+
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+			heap_page_prune_and_freeze(relation, buffer, false,
+									   InvalidBuffer,
+									   vistest, 0,
 									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
  * (see heap_prune_satisfies_vacuum).
  *
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   bool blk_known_av,
+						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
 						   int options,
 						   struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	/*
+	 * Clear any VM corruption. This does not need to be done in a critical
+	 * section.
+	 */
+	presult->vm_corruption = false;
+	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+																blockno, buffer, page,
+																blk_known_av,
+																prstate.lpdead_items, vmbuffer);
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 402b2bd65ca..9e0b0a31013 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,13 +431,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-
-static bool identify_and_fix_vm_corruption(Relation relation,
-										   BlockNumber heap_blk,
-										   Buffer heap_buffer, Page heap_page,
-										   bool heap_blk_known_av,
-										   int64 nlpdead_items,
-										   Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1940,65 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
-							   BlockNumber heap_blk,
-							   Buffer heap_buffer, Page heap_page,
-							   bool heap_blk_known_av,
-							   int64 nlpdead_items,
-							   Buffer vmbuffer)
-{
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
-		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	return false;
-}
-
 
 /* qsort comparator for sorting OffsetNumbers */
 static int
@@ -2055,11 +1989,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+	heap_page_prune_and_freeze(rel, buf,
+							   all_visible_according_to_vm,
+							   vmbuffer,
+							   vacrel->vistest, prune_options,
 							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2144,10 +2081,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables. Start by looking for any VM corruption.
+	 * all_frozen variables.
 	 */
-	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
-									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	if (presult.vm_corruption)
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 1fa6eb047fd..0886867a161 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -41,6 +41,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
@@ -246,6 +247,7 @@ typedef struct PruneFreezeResult
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
+	bool		vm_corruption;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -385,6 +387,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   bool blk_known_av,
+									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
 									   int options,
 									   struct VacuumCutoffs *cutoffs,
-- 
2.34.1



  [text/x-patch] v2-0010-Update-VM-in-pruneheap.c.patch (12.0K, 10-v2-0010-Update-VM-in-pruneheap.c.patch)
  download | inline diff:
From cdf83732bb633199eab6016e08e7cc1c2185c144 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v2 10/12] Update VM in pruneheap.c

As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
 src/backend/access/heap/pruneheap.c  | 99 +++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 81 +++++------------------
 src/include/access/heapam.h          | 15 +++--
 3 files changed, 106 insertions(+), 89 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 496b70e318f..425dcc77534 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -364,7 +364,8 @@ identify_and_fix_vm_corruption(Relation relation,
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -440,6 +441,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint;
+	uint8		vmflags = 0;
+	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -939,7 +942,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 *
 	 * Now that freezing has been finalized, unset all_visible if there are
 	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
+	 * of the page, as expected for updating the visibility map.
 	 */
 	if (prstate.all_visible && prstate.lpdead_items == 0)
 	{
@@ -955,31 +958,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->hastup = prstate.hastup;
 
 	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so the VM update record doesn't need it.
 	 */
 	if (presult->all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
 		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
 
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
 	/*
-	 * Clear any VM corruption. This does not need to be done in a critical
-	 * section.
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
 	 */
-	presult->vm_corruption = false;
 	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
-		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
-																blockno, buffer, page,
-																blk_known_av,
-																prstate.lpdead_items, vmbuffer);
+	{
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av,
+										   prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/*
+		 * If the page isn't yet marked all-visible in the VM or it is and
+		 * needs to me marked all-frozen, update the VM Note that all_frozen
+		 * is only valid if all_visible is true, so we must check both
+		 * all_visible and all_frozen.
+		 */
+		else if (presult->all_visible &&
+				 (!blk_known_av ||
+				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			Assert(prstate.lpdead_items == 0);
+			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			/*
+			 * If the page is all-frozen, we can pass InvalidTransactionId as
+			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+			 * make everything safe for REDO was logged when the page's tuples
+			 * were frozen.
+			 */
+			if (presult->all_frozen)
+			{
+				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			}
+
+			/*
+			 * It's possible for the VM bit to be clear and the page-level bit
+			 * to be set if checksums are not enabled.
+			 *
+			 * And even if we are just planning to update the frozen bit in
+			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
+			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+			 * might have become stale.
+			 *
+			 * If the heap page is all-visible but the VM bit is not set, we
+			 * don't need to dirty the heap page.  However, if checksums are
+			 * enabled, we do need to make sure that the heap page is dirtied
+			 * before passing it to visibilitymap_set(), because it may be
+			 * logged.
+			 */
+			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+			{
+				PageSetAllVisible(page);
+				MarkBufferDirty(buffer);
+			}
+
+			old_vmbits = heap_page_set_vm_and_log(relation, blockno, buffer,
+												  vmbuffer, presult->vm_conflict_horizon,
+												  vmflags);
+		}
+	}
+
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
+
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = vmflags;
+
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9e0b0a31013..8daad54a0fe 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1933,7 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -1949,7 +1948,8 @@ cmpOffsetNumbers(const void *a, const void *b)
  * vmbuffer is the buffer containing the VM block with visibility information
  * for the heap block, blkno. all_visible_according_to_vm is the saved
  * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1976,6 +1976,7 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
+	 * Then, if the page's visibility status has changed, update the VM.
 	 *
 	 * If the relation has no indexes, we can immediately mark would-be dead
 	 * items LP_UNUSED.
@@ -1984,10 +1985,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * presult.ndeleted.  It should not be confused with presult.lpdead_items;
 	 * presult.lpdead_items's final value can be thought of as the number of
 	 * tuples that were deleted from indexes.
-	 *
-	 * We will update the VM after collecting LP_DEAD items and freezing
-	 * tuples. Pruning will have determined whether or not the page is
-	 * all-visible.
 	 */
 	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
@@ -2079,70 +2076,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
-	 */
-	if (presult.vm_corruption)
-	{
-		/* Don't update the VM if we just cleared corruption in it */
-	}
-
-	/*
-	 * If the page isn't yet marked all-visible in the VM or it is and needs
-	 * to me marked all-frozen, update the VM Note that all_frozen is only
-	 * valid if all_visible is true, so we must check both all_visible and
-	 * all_frozen.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	else if (presult.all_visible &&
-			 (!all_visible_according_to_vm ||
-			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		/*
-		 * If the page is all-frozen, we can pass InvalidTransactionId as our
-		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
-		 * everything safe for REDO was logged when the page's tuples were
-		 * frozen.
-		 */
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
-											  vmbuffer, presult.vm_conflict_horizon,
-											  flags);
-
-		/*
-		 * Even if we are only setting the all-frozen bit, there is a small
-		 * chance that the VM was modified sometime between setting
-		 * all_visible_according_to_vm and checking the visibility during
-		 * pruning. Check the return value of old_vmbits to ensure the
-		 * visibility map counters used for logging are accurate.
-		 *
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
+	}
 }
 
 /*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0886867a161..534a63aab31 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -234,20 +234,21 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
+	 * all_visible and all_frozen indicate the status of the page as reflected
+	 * in the visibility map after pruning, freezing, and setting any pages
+	 * all-visible in the visibility map.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * vm_conflict_horizon is the newest xmin of live tuples on the page
+	 * (older than OldestXmin).  It will only be valid if we did not set the
+	 * page all-frozen in the VM.
 	 *
 	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
-	bool		vm_corruption;
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
-- 
2.34.1



  [text/x-patch] v2-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patch (6.1K, 11-v2-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patch)
  download | inline diff:
From 86684d2c31ab2da25d742028fab502e67cc73545 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v2 06/12] Use xl_heap_prune record for setting empty pages
 all-visible

As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
 src/backend/access/heap/pruneheap.c  | 14 ++++++--
 src/backend/access/heap/vacuumlazy.c | 54 ++++++++++++++++++----------
 src/include/access/heapam.h          |  1 +
 3 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d9ba0f96e34..97e51f78854 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  false,
 									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
@@ -2051,6 +2052,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
  * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
  * the page LSN when checksums/wal_log_hints are enabled even if we did not
  * prune or freeze tuples on the page.
@@ -2061,6 +2065,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  bool force_heap_fpi,
 						  Buffer vmbuffer,
 						  uint8 vmflags,
 						  bool set_pd_all_vis,
@@ -2089,13 +2094,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlrec.flags = 0;
 	regbuf_flags = REGBUF_STANDARD;
 
+	if (force_heap_fpi)
+		regbuf_flags |= REGBUF_FORCE_IMAGE;
+
 	/*
 	 * We can avoid an FPI if the only modification we are making to the heap
 	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+	else if (!do_prune &&
+			 nfrozen == 0 &&
+			 (!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 56acb224d71..68c8b0f4475 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1850,6 +1850,7 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 	if (PageIsEmpty(page))
 	{
+
 		/*
 		 * It seems likely that caller will always be able to get a cleanup
 		 * lock on an empty page.  But don't take any chances -- escalate to
@@ -1877,31 +1878,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+				VISIBILITYMAP_ALL_FROZEN;
+
 			START_CRIT_SECTION();
 
-			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
-			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				PageGetLSN(page) == InvalidXLogRecPtr)
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			heap_page_set_vm(vacrel->rel, blkno, buf,
+										  vmbuffer, new_vmbits);
+
+			/* Should have set PD_ALL_VISIBLE and marked buf dirty */
+			Assert(BufferIsDirty(buf));
+
+			if (RelationNeedsWAL(vacrel->rel))
 			{
-				MarkBufferDirty(buf);
-				log_newpage_buffer(buf, true);
+				/*
+				 * It's possible that another backend has extended the heap,
+				 * initialized the page, and then failed to WAL-log the page
+				 * due to an ERROR.  Since heap extension is not WAL-logged,
+				 * recovery might try to replay our record setting the page
+				 * all-visible and find that the page isn't initialized, which
+				 * will cause a PANIC. To prevent that, if the page hasn't
+				 * been previously WAL-logged, force a heap FPI.
+				 */
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  PageGetLSN(page) == InvalidXLogRecPtr,
+										  vmbuffer,
+										  new_vmbits,
+										  true,
+										  InvalidTransactionId,
+										  false, PRUNE_VACUUM_SCAN,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
 			}
 
-			heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
-									 vmbuffer, InvalidTransactionId,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
-			/* Count the newly all-frozen pages for logging */
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+			/* Count the newly all-frozen pages for logging. */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
 		}
@@ -2882,6 +2899,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
+								  false,
 								  vmbuffer,
 								  vmflags,
 								  set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index d2ac380bb64..1fa6eb047fd 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -399,6 +399,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  bool force_heap_fpi,
 									  Buffer vmbuffer,
 									  uint8 vmflags,
 									  bool vm_modified_heap_page,
-- 
2.34.1



  [text/x-patch] v2-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch (24.7K, 12-v2-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch)
  download | inline diff:
From 0aa2f93ff11a27c21f857326e90c813e765ecada Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v2 11/12] Eliminate xl_heap_visible from vacuum phase I
 prune/freeze

Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.

This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
 src/backend/access/heap/pruneheap.c  | 384 +++++++++++++++------------
 src/backend/access/heap/vacuumlazy.c |  30 ---
 src/include/access/heapam.h          |  15 +-
 3 files changed, 223 insertions(+), 206 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 425dcc77534..2d9624a246e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		freeze;
+
+	/*
+	 * Whether or not to consider updating the VM. There is some bookkeeping
+	 * that must be maintained if we would like to update the VM.
+	 */
+	bool		update_vm;
+
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
 	 *
 	 * These fields are not used by pruning itself for the most part, but are
 	 * used to collect information about what was pruned and what state the
-	 * page is in after pruning, for the benefit of the caller.  They are
-	 * copied to the caller's PruneFreezeResult at the end.
+	 * page is in after pruning to use when updating the visibility map and
+	 * for the benefit of the caller.  They are copied to the caller's
+	 * PruneFreezeResult at the end.
 	 * -------------------------------------------------------
 	 */
 
@@ -138,11 +146,10 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+	 * directly before updating the VM. We ignore LP_DEAD items when deciding
+	 * whether or not to opportunistically freeze and when determining the
+	 * snapshot conflict horizon required when freezing tuples.
 	 *
 	 * all_frozen should only be considered valid if all_visible is also set;
 	 * we don't bother to clear the all_frozen flag every time we clear the
@@ -377,11 +384,15 @@ identify_and_fix_vm_corruption(Relation relation,
  * considered advantageous for overall system performance to do so now.  The
  * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
  * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
+ * set presult->all_visible and presult->all_frozen on exit, for use when
+ * validating the changes made to the VM. They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page.
+ *
  * blk_known_av is the visibility status of the heap block as of the last call
  * to find_next_unskippable_block(). vmbuffer is the buffer that may already
  * contain the required block of the visibility map.
@@ -396,6 +407,8 @@ identify_and_fix_vm_corruption(Relation relation,
  *   FREEZE indicates that we will also freeze tuples, and will return
  *   'all_visible', 'all_frozen' flags to the caller.
  *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -441,15 +454,19 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint;
+	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	bool		all_frozen_except_lp_dead = false;
+	bool		set_pd_all_visible = false;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
 	/*
@@ -496,29 +513,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Keep track of whether or not the page will be all-visible and
+	 * all-frozen for use in opportunistic freezing and to update the VM if
+	 * the caller requests it.
+	 *
+	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
+	 * other callers could do either one. The visibility bookkeeping is
+	 * required for opportunistic freezing (in addition to setting the VM
+	 * bits) because we only consider opportunistically freezing tuples if the
+	 * whole page would become all-frozen or if the whole page will be frozen
+	 * except for dead tuples that will be removed by vacuum.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Dead tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing, so we do not clear
+	 * all_visible when we see LP_DEAD items. We fix that after determining
+	 * whether or not to freeze but before deciding whether or not to update
+	 * the VM so that we don't set the VM bit incorrectly.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * bookkeeping.
 	 */
-	if (prstate.freeze)
+	if (prstate.freeze || prstate.update_vm)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -534,12 +549,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon.
+	 * This is most likely to happen when updating the VM and/or freezing all
+	 * live tuples on the page. It is updated before returning to the caller
+	 * because vacuum does assert-build only validation on the page using this
+	 * field.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -827,6 +845,68 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	all_frozen_except_lp_dead = prstate.all_frozen;
+	if (prstate.lpdead_items > 0)
+	{
+		prstate.all_visible = false;
+		prstate.all_frozen = false;
+	}
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
+	 */
+	if (prstate.update_vm)
+	{
+		/*
+		 * Clear any VM corruption. This does not need to be in a critical
+		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+		 * we may mark the heap page buffer dirty here and could end up doing
+		 * so again later. This is not a correctness issue and is in the path
+		 * of VM corruption, so we don't have to worry about the extra
+		 * performance overhead.
+		 */
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av, prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/* Determine if we actually need to set the VM and which bits to set. */
+		else if (prstate.all_visible &&
+				 (!blk_known_av ||
+				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+			if (prstate.all_frozen)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+	}
+
+	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+	/* Save these for the caller in case we later zero out vmflags */
+	presult->new_vmbits = vmflags;
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -848,13 +928,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
 		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * the buffer dirty and emit WAL below.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_prune && !do_freeze && !do_set_vm)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -868,7 +948,23 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (do_prune || do_freeze)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			old_vmbits = heap_page_set_vm(relation, blockno, buffer,
+										  vmbuffer, vmflags);
+
+			if (old_vmbits == vmflags)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				do_set_vm = false;
+				/* 0 out vmflags so we don't emit VM update WAL */
+				vmflags = 0;
+			}
+		}
 
 		/*
 		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
@@ -885,35 +981,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
-			TransactionId conflict_xid;
+			TransactionId conflict_xid = InvalidTransactionId;
+
+			/*
+			 * If we are updating the VM, the conflict horizon is almost
+			 * always the visibility cutoff XID.
+			 *
+			 * Separately, if we are freezing any tuples, as an optimization,
+			 * we can use the visibility_cutoff_xid as the conflict horizon if
+			 * the page will be all-frozen. This is true even if there are
+			 * LP_DEAD line pointers because we ignored those when maintaining
+			 * the visibility_cutoff_xid.
+			 */
+			if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+				conflict_xid = prstate.visibility_cutoff_xid;
 
 			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
+			 * Otherwise, if we are freezing but the page would not be
+			 * all-frozen, we have to use the more pessimistic horizon of
+			 * OldestXmin, which may be newer than the newest tuple we froze.
+			 * That's because we won't have maintained the
+			 * visibility_cutoff_xid.
 			 */
-			if (do_freeze)
+			else if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
+				conflict_xid = prstate.cutoffs->OldestXmin;
+				TransactionIdRetreat(conflict_xid);
 			}
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
+			/*
+			 * If we are removing tuples with a younger xmax than our so far
+			 * calculated conflict_xid, we must use this as our horizon.
+			 */
+			if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
 				conflict_xid = prstate.latest_xid_removed;
 
+			/*
+			 * We can omit the snapshot conflict horizon if we are not pruning
+			 * or freezing any tuples and are setting an already all-visible
+			 * page all-frozen in the VM. In this case, all of the tuples on
+			 * the page must already be visible to all MVCC snapshots on the
+			 * standby.
+			 */
+			if (!do_prune && !do_freeze && do_set_vm &&
+				blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+				conflict_xid = InvalidTransactionId;
+
 			log_heap_prune_and_freeze(relation, buffer,
 									  false,
-									  InvalidBuffer, 0, false,
+									  vmbuffer,
+									  vmflags,
+									  set_pd_all_visible,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -925,124 +1043,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected for updating the visibility map.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
-	presult->hastup = prstate.hastup;
-
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so the VM update record doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * VACUUM will call heap_page_is_all_visible() during the second pass over
+	 * the heap to determine all_visible and all_frozen for the page -- this
+	 * is a specialized version of the logic from this function.  Now that
+	 * we've finished pruning and freezing, make sure that we're in total
+	 * agreement with heap_page_is_all_visible() using an assertion. We will
+	 * have already set the page in the VM, so this assertion will only let
+	 * you know that you've already done something wrong.
 	 */
-	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
 	{
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av,
-										   prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		/*
-		 * If the page isn't yet marked all-visible in the VM or it is and
-		 * needs to me marked all-frozen, update the VM Note that all_frozen
-		 * is only valid if all_visible is true, so we must check both
-		 * all_visible and all_frozen.
-		 */
-		else if (presult->all_visible &&
-				 (!blk_known_av ||
-				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			Assert(prstate.lpdead_items == 0);
-			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		Assert(cutoffs);
 
-			/*
-			 * If the page is all-frozen, we can pass InvalidTransactionId as
-			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
-			 * make everything safe for REDO was logged when the page's tuples
-			 * were frozen.
-			 */
-			if (presult->all_frozen)
-			{
-				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			}
+		Assert(prstate.lpdead_items == 0);
 
-			/*
-			 * It's possible for the VM bit to be clear and the page-level bit
-			 * to be set if checksums are not enabled.
-			 *
-			 * And even if we are just planning to update the frozen bit in
-			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
-			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
-			 * might have become stale.
-			 *
-			 * If the heap page is all-visible but the VM bit is not set, we
-			 * don't need to dirty the heap page.  However, if checksums are
-			 * enabled, we do need to make sure that the heap page is dirtied
-			 * before passing it to visibilitymap_set(), because it may be
-			 * logged.
-			 */
-			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buffer);
-			}
+		if (!heap_page_is_all_visible(relation, buffer,
+									  cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
 
-			old_vmbits = heap_page_set_vm_and_log(relation, blockno, buffer,
-												  vmbuffer, presult->vm_conflict_horizon,
-												  vmflags);
-		}
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
+#endif
 
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->old_vmbits = old_vmbits;
+	/* new_vmbits was set above */
+	presult->hastup = prstate.hastup;
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = vmflags;
-
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
@@ -1624,8 +1673,13 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			break;
 	}
 
-	/* Consider freezing any normal tuples which will not be removed */
-	if (prstate->freeze)
+	/*
+	 * Consider freezing any normal tuples which will not be removed.
+	 * Regardless of whether or not we want to freeze the tuples, if we want
+	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+	 * tuple to know whether or not the page will be totally frozen.
+	 */
+	if (prstate->freeze || prstate->update_vm)
 	{
 		bool		totally_frozen;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8daad54a0fe..246ba07db9c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2012,34 +2012,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2073,8 +2045,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 534a63aab31..e35b4adf38d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -234,19 +234,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate the status of the page as reflected
-	 * in the visibility map after pruning, freezing, and setting any pages
-	 * all-visible in the visibility map.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page
-	 * (older than OldestXmin).  It will only be valid if we did not set the
-	 * page all-frozen in the VM.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
 	uint8		old_vmbits;
 	uint8		new_vmbits;
 
-- 
2.34.1



  [text/x-patch] v2-0012-Remove-xl_heap_visible-entirely.patch (20.8K, 13-v2-0012-Remove-xl_heap_visible-entirely.patch)
  download | inline diff:
From a1bbff2e42b771bbd8a4b8e2b0719e4582bfcf1f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v2 12/12] Remove xl_heap_visible entirely

There are now no users of this, so eliminate it entirely.

ci-os-only:
---
 src/backend/access/common/bufmask.c      |   3 +-
 src/backend/access/heap/heapam.c         | 113 +----------------
 src/backend/access/heap/heapam_xlog.c    | 150 +----------------------
 src/backend/access/heap/vacuumlazy.c     |   4 +-
 src/backend/access/heap/visibilitymap.c  |  84 +------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/include/access/heapam.h              |   3 -
 src/include/access/heapam_xlog.h         |   6 -
 src/include/access/visibilitymap.h       |  10 +-
 10 files changed, 14 insertions(+), 370 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d2cf8aa9fb8..6f134dfd535 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
 #include "access/valid.h"
 #include "access/visibilitymap.h"
 #include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
@@ -7845,73 +7846,6 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
 	return false;
 }
 
-/*
- * Make the heap and VM page changes needed to set a page all-visible.
- * Do not call in recovery.
- */
-uint8
-heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
-						 Buffer vmbuf, TransactionId cutoff_xid,
-						 uint8 vmflags)
-{
-	Page		heap_page = BufferGetPage(heap_buf);
-	bool		set_heap_lsn = false;
-	XLogRecPtr	recptr = InvalidXLogRecPtr;
-	uint8		old_vmbits = 0;
-
-	Assert(BufferIsValid(heap_buf));
-
-	START_CRIT_SECTION();
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferGetBlockNumber(heap_buf) != heap_blk)
-		elog(ERROR, "wrong heap buffer passed to heap_page_set_vm_and_log");
-
-	/*
-	 * We must never end up with the VM bit set and the page-level
-	 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
-	 * modification would fail to clear the VM bit. Though it is possible for
-	 * the page-level bit to be set and the VM bit to be clear if checksums
-	 * and wal_log_hints are not enabled.
-	 */
-	if (!PageIsAllVisible(heap_page))
-	{
-		PageSetAllVisible(heap_page);
-
-		/*
-		 * Buffer will usually be dirty from other changes, so it is worth the
-		 * extra check
-		 */
-		if (!BufferIsDirty(heap_buf))
-		{
-			if (XLogHintBitIsNeeded())
-				MarkBufferDirty(heap_buf);
-			else
-				MarkBufferDirtyHint(heap_buf, true);
-		}
-
-		set_heap_lsn = XLogHintBitIsNeeded();
-	}
-
-	old_vmbits = visibilitymap_set(rel, heap_blk, heap_buf,
-								   &recptr, vmbuf, cutoff_xid, vmflags);
-
-	/*
-	 * If we modified the heap page and data checksums are enabled (or
-	 * wal_log_hints=on), we need to protect the heap page from being torn.
-	 *
-	 * If not, then we must *not* update the heap page's LSN. In this case,
-	 * the FPI for the heap page was omitted from the WAL record inserted in
-	 * the VM record, so it would be incorrect to update the heap page's LSN.
-	 */
-	if (set_heap_lsn)
-		PageSetLSN(heap_page, recptr);
-
-	END_CRIT_SECTION();
-
-	return old_vmbits;
-}
-
 /*
  * Ensure the provided heap page is marked PD_ALL_VISIBLE and then set the
  * provided vmflags in the provided vmbuf.
@@ -7953,7 +7887,7 @@ heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
 		MarkBufferDirty(heap_buf);
 	}
 
-	return visibilitymap_set_vmbyte(rel, heap_blk, vmbuf, vmflags);
+	return visibilitymap_set(rel, heap_blk, vmbuf, vmflags);
 }
 
 /*
@@ -8895,49 +8829,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
 
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 70a46a37357..975a59d717e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -273,7 +273,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+		old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
 			PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -284,143 +284,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		UnlockReleaseBuffer(vmbuffer);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-		visibilitymap_pin(reln, blkno, &vmbuffer);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, &lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -797,10 +660,10 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		visibilitymap_set_vmbyte(reln, blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN);
+		visibilitymap_set(reln, blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
 		PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		FreeFakeRelcacheEntry(reln);
 	}
@@ -1380,9 +1243,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 246ba07db9c..9371d6f37c1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1878,14 +1878,14 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
-			uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+			uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
 				VISIBILITYMAP_ALL_FROZEN;
 
 			START_CRIT_SECTION();
 
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 			heap_page_set_vm(vacrel->rel, blkno, buf,
-										  vmbuffer, new_vmbits);
+							 vmbuffer, new_vmbits);
 
 			/* Should have set PD_ALL_VISIBLE and marked buf dirty */
 			Assert(BufferIsDirty(buf));
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index cabd0fa0880..a24554fe191 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -219,86 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap buffer.
- * When checksums are enabled and we're not in recovery, we must add the heap
- * buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * cutoff_xid is the largest xmin on the page being marked all-visible; it is
- * needed for Hot Standby, and can be InvalidTransactionId if the page
- * contains no tuples.  It can also be set to InvalidTransactionId when a page
- * that is already all-visible is being marked all-frozen.
- *
- * If we're in recovery, recptr points to the LSN of the XLOG record we're
- * replaying and the VM page LSN is advanced to this LSN. During normal
- * running, we'll generate a new XLOG record for the changes to the VM and set
- * the VM page LSN. We will return this LSN in recptr, and the caller may use
- * this to set the heap page LSN.
- *
- * Returns the state of the page's VM bits before setting flags and sets.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr *recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(*recptr));
-	Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(*recptr))
-			{
-				Assert(!InRecovery);
-				*recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-			}
-			PageSetLSN(page, *recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
 
 /*
  * Set flags in the VM block contained in the passed in vmBuf.
@@ -308,8 +228,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * making any changes needed to the associated heap page.
  */
 uint8
-visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index d6c86ccac20..f7880a4ed81 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -351,13 +351,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -462,9 +455,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e35b4adf38d..c404b794fda 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -365,9 +365,6 @@ extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
 
 extern uint8 heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
 							  Buffer vmbuf, uint8 vmflags);
-extern uint8 heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
-									  Buffer vmbuf, TransactionId cutoff_xid,
-									  uint8 vmflags);
 
 extern void simple_heap_insert(Relation relation, HeapTuple tup);
 extern void simple_heap_delete(Relation relation, ItemPointer tid);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ceae9c083ff..9a61434b881 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -495,11 +494,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 91ef3705e84..20141e3e805 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -31,14 +31,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr *recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-07-09 21:59  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-07-09 21:59 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>

--0000000000007eb0740639863165
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thu, Jun 26, 2025 at 6:04=E2=80=AFPM Melanie Plageman
<[email protected]> wrote:
>
> Rebased in light of recent changes on master:

This needed another rebase, and, in light of the discussion in [1],
I've also removed the patch to add heap wrappers for setting pages
all-visible.

More notably, the final patch (0012) in attached v3 allows on-access
pruning to set the VM.

To do this, it plumbs some information down from the executor to the
table scan about whether or not the table is modified by the query. We
don't want to set the VM only to clear it while scanning pages for an
UPDATE or while locking rows in a SELECT FOR UPDATE.

Because we only do on-access pruning when pd_prune_xid is valid, we
shouldn't need much of a heuristic for deciding when to set the VM
on-access -- but I've included one anyway: we only do it if we are
actually pruning or if the page is already dirty and no FPI would be
emitted.

You can see it in action with the following:

create extension pg_visibility;
create table foo (a int, b int) with (autovacuum_enabled=3Dfalse, fillfacto=
r=3D90);
insert into foo select generate_series(1,300), generate_series(1,300);
create index on foo (a);
update foo set b =3D 51 where b =3D 50;
select * from foo where a =3D 50;
select * from pg_visibility_map_summary('foo');

The SELECT will set a page all-visible in the VM.
In this patch set, on-access pruning is enabled for sequential scans
and the underlying heap relation in index scans and bitmap heap scans.
This example can exercise any of the three if you toggle
enable_indexscan and enable_bitmapscan appropriately.



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-07-11 22:19  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-07-11 22:19 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey M. Borodin <[email protected]>

On Wed, Jul 9, 2025 at 5:59 PM Melanie Plageman
<[email protected]> wrote:
>
> On Thu, Jun 26, 2025 at 6:04 PM Melanie Plageman
> <[email protected]> wrote:
> >
> > Rebased in light of recent changes on master:
>
> This needed another rebase, and, in light of the discussion in [1],
> I've also removed the patch to add heap wrappers for setting pages
> all-visible.

Andrey Borodin made the excellent point off-list that I forgot to
remove the xl_heap_visible struct itself -- which is rather important
to a patch set purporting to eliminate xl_heap_visible! New version
attached.


- Melanie


Attachments:

  [application/x-patch] v4-0002-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch (10.2K, 2-v4-0002-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch)
  download | inline diff:
From 68f26df83f6ac0f8ce9a8c73894d2298c5273996 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v4 02/13] Eliminate xl_heap_visible in COPY FREEZE

Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, include the required update in the xl_heap_multi_insert
record instead.
---
 src/backend/access/heap/heapam.c        | 47 +++++++++++---------
 src/backend/access/heap/heapam_xlog.c   | 39 +++++++++++++++-
 src/backend/access/heap/visibilitymap.c | 59 +++++++++++++++++++++++++
 src/backend/access/rmgrdesc/heapdesc.c  |  5 +++
 src/include/access/visibilitymap.h      |  2 +
 5 files changed, 130 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..68db4325285 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2493,9 +2493,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * If the page is all visible, need to clear that, unless we're only
 		 * going to add further frozen rows to it.
-		 *
-		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2505,8 +2502,22 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+
+		/*
+		 * If we're only adding already frozen rows to a previously empty
+		 * page, mark it as all-frozen and update the visibility map. We're
+		 * already holding a pin on the vmbuffer.
+		 */
 		else if (all_frozen_set)
+		{
 			PageSetAllVisible(page);
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			visibilitymap_set_vmbyte(relation,
+									 BufferGetBlockNumber(buffer),
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN);
+		}
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2554,6 +2565,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2616,7 +2633,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2626,29 +2646,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
-		 */
 		if (all_frozen_set)
-		{
-			Assert(PageIsAllVisible(page));
-			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index eb4bd3d6ae3..2485c344191 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -552,6 +552,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -572,11 +573,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -663,6 +664,42 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Now read and update the VM block. Even if we skipped updating the heap
+	 * page due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * It is only okay to set the VM bits without holding the heap page lock
+	 * because we can expect no other writers of this page.
+	 */
+	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		visibilitymap_pin(reln, blkno, &vmbuffer);
+		visibilitymap_set_vmbyte(reln, blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN);
+
+		/*
+		 * It is not possible that the VM was already set for this heap page,
+		 * so the vmbuffer must have been modified and marked dirty.
+		 */
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 745a04ef26e..573df6f6891 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -318,6 +318,65 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page.
+ */
+uint8
+visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+	/* Call in same critical section where WAL is emitted. */
+	Assert(InRecovery || CritSectionCount > 0);
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..977566f6b98 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [application/x-patch] v4-0003-Make-heap_page_is_all_visible-independent-of-LVRe.patch (5.1K, 3-v4-0003-Make-heap_page_is_all_visible-independent-of-LVRe.patch)
  download | inline diff:
From 0846b7106d72c6ade04eccebe51dcc1e1cedd39a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v4 03/13] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need two parameters from the
LVRelState, so just pass those in explicitly.
---
 src/backend/access/heap/vacuumlazy.c | 45 ++++++++++++++++++----------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 14036c27e87..8a62a93eee5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,8 +464,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2010,8 +2013,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2907,8 +2911,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * emitted.
 	 */
 	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
@@ -3592,9 +3596,16 @@ dead_items_cleanup(LVRelState *vacrel)
 
 /*
  * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * Return the visibility_cutoff_xid which is the highest xmin amongst the
+ * visible tuples. Sets *all_frozen to true if every tuple on this page is
+ * frozen.
  *
  * This is a stripped down version of lazy_scan_prune().  If you change
  * anything here, make sure that everything stays in sync.  Note that an
@@ -3602,9 +3613,11 @@ dead_items_cleanup(LVRelState *vacrel)
  * introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3627,7 +3640,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3651,9 +3664,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
 										 buf))
 		{
 			case HEAPTUPLE_LIVE:
@@ -3674,7 +3687,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+											   OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3709,7 +3722,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
-- 
2.43.0



  [application/x-patch] v4-0005-Use-xl_heap_prune-record-for-setting-empty-pages-.patch (5.8K, 4-v4-0005-Use-xl_heap_prune-record-for-setting-empty-pages-.patch)
  download | inline diff:
From 60b36cd8b4d9e2de125690a7fcfbab7330c12287 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v4 05/13] Use xl_heap_prune record for setting empty pages
 all-visible

As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
 src/backend/access/heap/pruneheap.c  | 14 +++++--
 src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++----------
 src/include/access/heapam.h          |  1 +
 3 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 68ecf50848b..2724cf7f64f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  false,
 									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
@@ -2052,6 +2053,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
  * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
  * the page LSN when checksums/wal_log_hints are enabled even if we did not
  * prune or freeze tuples on the page.
@@ -2062,6 +2066,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  bool force_heap_fpi,
 						  Buffer vmbuffer,
 						  uint8 vmflags,
 						  bool set_pd_all_vis,
@@ -2090,13 +2095,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlrec.flags = 0;
 	regbuf_flags = REGBUF_STANDARD;
 
+	if (force_heap_fpi)
+		regbuf_flags |= REGBUF_FORCE_IMAGE;
+
 	/*
 	 * We can avoid an FPI if the only modification we are making to the heap
 	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+	else if (!do_prune &&
+			 nfrozen == 0 &&
+			 (!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 460cdbd8417..d9e195269d2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1878,33 +1878,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+				VISIBILITYMAP_ALL_FROZEN;
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
-			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				PageGetLSN(page) == InvalidXLogRecPtr)
-				log_newpage_buffer(buf, true);
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			visibilitymap_set_vmbyte(vacrel->rel, blkno,
+									 vmbuffer, new_vmbits);
+
+			if (RelationNeedsWAL(vacrel->rel))
+			{
+				/*
+				 * It's possible that another backend has extended the heap,
+				 * initialized the page, and then failed to WAL-log the page
+				 * due to an ERROR.  Since heap extension is not WAL-logged,
+				 * recovery might try to replay our record setting the page
+				 * all-visible and find that the page isn't initialized, which
+				 * will cause a PANIC. To prevent that, if the page hasn't
+				 * been previously WAL-logged, force a heap FPI.
+				 */
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  PageGetLSN(page) == InvalidXLogRecPtr,
+										  vmbuffer,
+										  new_vmbits,
+										  true,
+										  InvalidTransactionId,
+										  false, PRUNE_VACUUM_SCAN,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+			}
 
-			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
-			/* Count the newly all-frozen pages for logging */
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+			/* Count the newly all-frozen pages for logging. */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
 		}
@@ -2918,6 +2932,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
+								  false,
 								  vmbuffer,
 								  vmflags,
 								  set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  bool force_heap_fpi,
 									  Buffer vmbuffer,
 									  uint8 vmflags,
 									  bool vm_modified_heap_page,
-- 
2.43.0



  [application/x-patch] v4-0001-Add-assert-to-heap_prune_record_unchanged_lp_norm.patch (963B, 5-v4-0001-Add-assert-to-heap_prune_record_unchanged_lp_norm.patch)
  download | inline diff:
From d98156d3d8ac522381dc3ccf9a8608168649fdfe Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 7 Jul 2025 17:33:26 -0400
Subject: [PATCH v4 01/13] Add assert to heap_prune_record_unchanged_lp_normal

Not all callers provide VacuumCutoffs to heap_page_prune_and_freeze(),
so assert those are provided before passing them along to
heap_prepare_freeze_tuple().
---
 src/backend/access/heap/pruneheap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a8025889be0..dd00931f179 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1480,6 +1480,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 	{
 		bool		totally_frozen;
 
+		Assert(prstate->cutoffs);
 		if ((heap_prepare_freeze_tuple(htup,
 									   prstate->cutoffs,
 									   &prstate->pagefrz,
-- 
2.43.0



  [application/x-patch] v4-0004-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch (24.4K, 6-v4-0004-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch)
  download | inline diff:
From 51900f8b94a1ebfc7777a9d9a4af379be8597ceb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v4 04/13] Eliminate xl_heap_visible from vacuum phase III

Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.
---
 src/backend/access/heap/heapam_xlog.c  | 142 ++++++++++++++++++++---
 src/backend/access/heap/pruneheap.c    |  48 +++++++-
 src/backend/access/heap/vacuumlazy.c   | 149 +++++++++++++++++--------
 src/backend/access/rmgrdesc/heapdesc.c |  13 ++-
 src/include/access/heapam.h            |   9 ++
 src/include/access/heapam_xlog.h       |   3 +
 6 files changed, 296 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2485c344191..14541e2e94f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. If pruning, that
+	 * means we cannot remove tuples still visible to transactions on the
+	 * standby. If freezing, that means we cannot freeze tuples with xids that
+	 * are still considered running on the standby. And for setting the VM, we
+	 * cannot do so if the page isn't all-visible to all transactions on the
+	 * standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -70,13 +76,28 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 												rlocator);
 	}
 
+	/* Next are the optionally included vmflags. Copy them out for later use. */
+	if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+	{
+		memcpy(&vmflags, maindataptr, sizeof(uint8));
+		maindataptr += sizeof(uint8);
+
+		/*
+		 * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
+		 * because we already have XLHP_IS_CATALOG_REL.
+		 */
+		Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+		/* Must never set all_frozen bit without also setting all_visible bit */
+		Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
+	}
+
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = (Page) BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -89,6 +110,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Size		datalen;
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
+		bool		do_prune;
+		bool		mark_buffer_dirty;
+		bool		set_heap_lsn;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +121,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+		set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 ||
+			   vmflags & VISIBILITYMAP_VALID_BITS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,26 +169,72 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		Assert(BufferIsValid(buffer) &&
+			   BufferGetBlockNumber(buffer) == blkno);
+
+		/*
+		 * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+		 * also going to set bits in the VM later.
+		 *
+		 * We must never end up with the VM bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+		 * modification would fail to clear the VM bit.
+		 */
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+
+			/*
+			 * If the only change to the heap page is setting PD_ALL_VISIBLE,
+			 * we can avoid setting the page LSN unless checksums or
+			 * wal_log_hints are enabled.
+			 */
+			set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+			mark_buffer_dirty = true;
+		}
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
 
-		PageSetLSN(page, lsn);
-		MarkBufferDirty(buffer);
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+		if (set_heap_lsn)
+			PageSetLSN(page, lsn);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or will be setting a page in
+	 * the visibility map, update the free space map.
+	 *
+	 * Even if we are just updating the VM (and thus not freeing up any
+	 * space), we'll still update the FSM for this page. Since FSM is not
+	 * WAL-logged and only updated heuristically, it easily becomes stale in
+	 * standbys.  If the standby is later promoted and runs VACUUM, it will
+	 * skip updating individual free space figures for pages that became
+	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
+	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+	 * space values to upper FSM layers; later inserters try to use such pages
+	 * only to find out that they are unusable.  This can cause long stalls
+	 * when there are many such pages.
+	 *
+	 * Forestall those problems by updating FSM's idea about a page that is
+	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
+	 *
+	 * We want to avoid holding an exclusive lock on the heap buffer while
+	 * doing IO (either of the FSM or the VM), so we'll release the lock on
+	 * the heap buffer before doing either.
 	 */
 	if (BufferIsValid(buffer))
 	{
-		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
-						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
+		if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+							XLHP_HAS_DEAD_ITEMS |
+							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+			vmflags & VISIBILITYMAP_VALID_BITS)
 		{
 			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
 
@@ -168,6 +245,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		else
 			UnlockReleaseBuffer(buffer);
 	}
+
+	/*
+	 * Read and update the VM block. Even if we skipped updating the heap page
+	 * due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * Note that it is *only* okay that we do not hold a lock on the heap page
+	 * because we are in recovery and can expect no other writers to clear
+	 * PD_ALL_VISIBLE before we are able to update the VM.
+	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		uint8		old_vmbits = 0;
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		visibilitymap_pin(reln, blkno, &vmbuffer);
+		old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dd00931f179..68ecf50848b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -2046,12 +2048,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer,
+						  uint8 vmflags,
+						  bool set_pd_all_vis,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
 						  PruneReason reason,
@@ -2063,6 +2076,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2071,8 +2085,19 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
 	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2080,7 +2105,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2137,6 +2166,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Prepare the main xl_heap_prune record.  We already set the XLHP_HAS_*
 	 * flag above.
 	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		xlrec.flags |= XLHP_HAS_VMFLAGS;
 	if (RelationIsAccessibleInLogicalDecoding(relation))
 		xlrec.flags |= XLHP_IS_CATALOG_REL;
 	if (TransactionIdIsValid(conflict_xid))
@@ -2151,6 +2182,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	XLogRegisterData(&xlrec, SizeOfHeapPrune);
 	if (TransactionIdIsValid(conflict_xid))
 		XLogRegisterData(&conflict_xid, sizeof(TransactionId));
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterData(&vmflags, sizeof(uint8));
 
 	switch (reason)
 	{
@@ -2169,5 +2202,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+
+	/*
+	 * If pruning or freezing tuples or setting the page all-visible when
+	 * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+	 * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+	 * but this is deemed okay for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+		PageSetLSN(BufferGetPage(buffer), recptr);
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a62a93eee5..460cdbd8417 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,11 +464,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
+static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+												   TransactionId OldestXmin,
+												   OffsetNumber *deadoffsets,
+												   int allowed_num_offsets,
+												   bool *all_frozen,
+												   TransactionId *visibility_cutoff_xid,
+												   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2847,8 +2849,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
+	bool		set_pd_all_vis = false;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2859,6 +2864,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
+											   vacrel->cutoffs.OldestXmin,
+											   deadoffsets, num_offsets,
+											   &all_frozen, &visibility_cutoff_xid,
+											   &vacrel->offnum))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2878,6 +2897,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
+		Assert(!PageIsAllVisible(page));
+		set_pd_all_vis = true;
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		PageSetAllVisible(page);
+		visibilitymap_set_vmbyte(vacrel->rel,
+								 blkno,
+								 vmbuffer, vmflags);
+		conflict_xid = visibility_cutoff_xid;
+	}
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2887,7 +2918,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  vmbuffer,
+								  vmflags,
+								  set_pd_all_vis,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
@@ -2896,39 +2930,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
-	 */
 	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
-	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
-								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  flags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
@@ -3594,6 +3601,25 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
+/*
+ * Wrapper for heap_page_is_all_visible_except_lpdead() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+												  NULL, 0,
+												  all_frozen,
+												  visibility_cutoff_xid,
+												  logging_offnum);
+}
+
 /*
  * Check if every tuple in the given page is visible to all current and future
  * transactions.
@@ -3607,23 +3633,35 @@ dead_items_cleanup(LVRelState *vacrel)
  * visible tuples. Sets *all_frozen to true if every tuple on this page is
  * frozen.
  *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * deadoffsets are the offsets we know about and are about to set LP_UNUSED.
+ * allowed_num_offsets is the number of those. As long as the LP_DEAD items we
+ * encounter on the page match those exactly, we can set the page all-visible
+ * in the VM.
+ *
+ * Callers looking to verify that the page is all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
+heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+									   TransactionId OldestXmin,
+									   OffsetNumber *deadoffsets,
+									   int allowed_num_offsets,
+									   bool *all_frozen,
+									   TransactionId *visibility_cutoff_xid,
+									   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	OffsetNumber current_dead_offsets[MaxHeapTuplesPerPage];
+	size_t		current_num_offsets = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
@@ -3655,9 +3693,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			current_dead_offsets[current_num_offsets++] = offnum;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
@@ -3724,7 +3761,23 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
-	return all_visible;
+	/* If we already know it's not all-visible, return false */
+	if (!all_visible)
+		return false;
+
+	/* If we weren't allowed any dead offsets, we're done */
+	if (allowed_num_offsets == 0)
+		return current_num_offsets == 0;
+
+	/* If the number of dead offsets has changed, that's wrong */
+	if (current_num_offsets != allowed_num_offsets)
+		return false;
+
+	Assert(deadoffsets);
+
+	/* The dead offsets must be the same dead offsets */
+	return memcmp(current_dead_offsets, deadoffsets,
+				  allowed_num_offsets * sizeof(OffsetNumber)) == 0;
 }
 
 /*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..d6c86ccac20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -266,6 +266,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 {
 	char	   *rec = XLogRecGetData(record);
 	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+	char	   *maindataptr = rec + SizeOfHeapPrune;
 
 	info &= XLOG_HEAP_OPMASK;
 	if (info == XLOG_HEAP2_PRUNE_ON_ACCESS ||
@@ -278,7 +279,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		{
 			TransactionId conflict_xid;
 
-			memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
+			memcpy(&conflict_xid, maindataptr, sizeof(TransactionId));
+			maindataptr += sizeof(TransactionId);
 
 			appendStringInfo(buf, "snapshotConflictHorizon: %u",
 							 conflict_xid);
@@ -287,6 +289,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & XLHP_HAS_VMFLAGS)
+		{
+			uint8		vmflags;
+
+			memcpy(&vmflags, maindataptr, sizeof(uint8));
+			maindataptr += sizeof(uint8);
+			appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+		}
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
 										   Buffer buffer);
 extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer,
+									  uint8 vmflags,
+									  bool vm_modified_heap_page,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
 									  PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..ceae9c083ff 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -295,6 +295,9 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
 
+/* If the record should update the VM, this is the new value */
+#define		XLHP_HAS_VMFLAGS			(1 << 0)
+
 /* to handle recovery conflict during logical decoding on standby */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
-- 
2.43.0



  [application/x-patch] v4-0010-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch (25.7K, 7-v4-0010-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch)
  download | inline diff:
From bf9f5caacfb0f1f12bff35e8cd004519deea6e11 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v4 10/13] Eliminate xl_heap_visible from vacuum phase I
 prune/freeze

Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.

This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
 src/backend/access/heap/pruneheap.c  | 402 ++++++++++++++++-----------
 src/backend/access/heap/vacuumlazy.c |  30 --
 src/include/access/heapam.h          |  15 +-
 3 files changed, 237 insertions(+), 210 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 797b3710862..6208f55176f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		freeze;
+
+	/*
+	 * Whether or not to consider updating the VM. There is some bookkeeping
+	 * that must be maintained if we would like to update the VM.
+	 */
+	bool		update_vm;
+
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
 	 *
 	 * These fields are not used by pruning itself for the most part, but are
 	 * used to collect information about what was pruned and what state the
-	 * page is in after pruning, for the benefit of the caller.  They are
-	 * copied to the caller's PruneFreezeResult at the end.
+	 * page is in after pruning to use when updating the visibility map and
+	 * for the benefit of the caller.  They are copied to the caller's
+	 * PruneFreezeResult at the end.
 	 * -------------------------------------------------------
 	 */
 
@@ -138,11 +146,10 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+	 * directly before updating the VM. We ignore LP_DEAD items when deciding
+	 * whether or not to opportunistically freeze and when determining the
+	 * snapshot conflict horizon required when freezing tuples.
 	 *
 	 * all_frozen should only be considered valid if all_visible is also set;
 	 * we don't bother to clear the all_frozen flag every time we clear the
@@ -377,11 +384,15 @@ identify_and_fix_vm_corruption(Relation relation,
  * considered advantageous for overall system performance to do so now.  The
  * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
  * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
+ * set presult->all_visible and presult->all_frozen on exit, for use when
+ * validating the changes made to the VM. They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page.
+ *
  * blk_known_av is the visibility status of the heap block as of the last call
  * to find_next_unskippable_block(). vmbuffer is the buffer that may already
  * contain the required block of the visibility map.
@@ -396,6 +407,8 @@ identify_and_fix_vm_corruption(Relation relation,
  *   FREEZE indicates that we will also freeze tuples, and will return
  *   'all_visible', 'all_frozen' flags to the caller.
  *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -441,15 +454,19 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint;
+	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	bool		all_frozen_except_lp_dead = false;
+	bool		set_pd_all_visible = false;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
 	/*
@@ -496,50 +513,53 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Keep track of whether or not the page will be all-visible and
+	 * all-frozen for use in opportunistic freezing and to update the VM if
+	 * the caller requests it.
+	 *
+	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
+	 * other callers could do either one. The visibility bookkeeping is
+	 * required for opportunistic freezing (in addition to setting the VM
+	 * bits) because we only consider opportunistically freezing tuples if the
+	 * whole page would become all-frozen or if the whole page will be frozen
+	 * except for dead tuples that will be removed by vacuum.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Dead tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing, so we do not clear
+	 * all_visible when we see LP_DEAD items. We fix that after determining
+	 * whether or not to freeze but before deciding whether or not to update
+	 * the VM so that we don't set the VM bit incorrectly.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * bookkeeping. Initializing all_visible to false allows skipping the work
+	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
 	if (prstate.freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if (prstate.update_vm)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon.
+	 * This is most likely to happen when updating the VM and/or freezing all
+	 * live tuples on the page. It is updated before returning to the caller
+	 * because vacuum does assert-build only validation on the page using this
+	 * field.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -827,6 +847,68 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	all_frozen_except_lp_dead = prstate.all_frozen;
+	if (prstate.lpdead_items > 0)
+	{
+		prstate.all_visible = false;
+		prstate.all_frozen = false;
+	}
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
+	 */
+	if (prstate.update_vm)
+	{
+		/*
+		 * Clear any VM corruption. This does not need to be in a critical
+		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+		 * we may mark the heap page buffer dirty here and could end up doing
+		 * so again later. This is not a correctness issue and is in the path
+		 * of VM corruption, so we don't have to worry about the extra
+		 * performance overhead.
+		 */
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av, prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/* Determine if we actually need to set the VM and which bits to set. */
+		else if (prstate.all_visible &&
+				 (!blk_known_av ||
+				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+			if (prstate.all_frozen)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+	}
+
+	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+	/* Save these for the caller in case we later zero out vmflags */
+	presult->new_vmbits = vmflags;
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -848,13 +930,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
 		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * the buffer dirty and emit WAL below.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_prune && !do_freeze && !do_set_vm)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -868,12 +950,34 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (set_pd_all_visible)
+			PageSetAllVisible(page);
+
+		if (do_prune || do_freeze || set_pd_all_visible)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
+												  vmbuffer, vmflags);
+
+			if (old_vmbits == vmflags)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				do_set_vm = false;
+				/* 0 out vmflags so we don't emit VM update WAL */
+				vmflags = 0;
+			}
+		}
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+		 * we were only updating the VM and it turns out it was already set,
+		 * we will have unset do_set_vm above. As such, check it again before
+		 * emitting the record.
 		 */
-		if (RelationNeedsWAL(relation))
+		if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -885,35 +989,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
-			TransactionId conflict_xid;
+			TransactionId conflict_xid = InvalidTransactionId;
 
 			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
+			 * If we are updating the VM, the conflict horizon is almost
+			 * always the visibility cutoff XID.
+			 *
+			 * Separately, if we are freezing any tuples, as an optimization,
+			 * we can use the visibility_cutoff_xid as the conflict horizon if
+			 * the page will be all-frozen. This is true even if there are
+			 * LP_DEAD line pointers because we ignored those when maintaining
+			 * the visibility_cutoff_xid.
 			 */
-			if (do_freeze)
+			if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+				conflict_xid = prstate.visibility_cutoff_xid;
+
+			/*
+			 * Otherwise, if we are freezing but the page would not be
+			 * all-frozen, we have to use the more pessimistic horizon of
+			 * OldestXmin, which may be newer than the newest tuple we froze.
+			 * That's because we won't have maintained the
+			 * visibility_cutoff_xid.
+			 */
+			else if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
+				conflict_xid = prstate.cutoffs->OldestXmin;
+				TransactionIdRetreat(conflict_xid);
 			}
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
+			/*
+			 * If we are removing tuples with a younger xmax than our so far
+			 * calculated conflict_xid, we must use this as our horizon.
+			 */
+			if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
 				conflict_xid = prstate.latest_xid_removed;
 
+			/*
+			 * We can omit the snapshot conflict horizon if we are not pruning
+			 * or freezing any tuples and are setting an already all-visible
+			 * page all-frozen in the VM. In this case, all of the tuples on
+			 * the page must already be visible to all MVCC snapshots on the
+			 * standby.
+			 */
+			if (!do_prune && !do_freeze && do_set_vm &&
+				blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+				conflict_xid = InvalidTransactionId;
+
 			log_heap_prune_and_freeze(relation, buffer,
 									  false,
-									  InvalidBuffer, 0, false,
+									  vmbuffer,
+									  vmflags,
+									  set_pd_all_visible,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -925,124 +1051,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected for updating the visibility map.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
-	presult->hastup = prstate.hastup;
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
 	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so the VM update record doesn't need it.
+	 * VACUUM will call heap_page_is_all_visible() during the second pass over
+	 * the heap to determine all_visible and all_frozen for the page -- this
+	 * is a specialized version of the logic from this function.  Now that
+	 * we've finished pruning and freezing, make sure that we're in total
+	 * agreement with heap_page_is_all_visible() using an assertion. We will
+	 * have already set the page in the VM, so this assertion will only let
+	 * you know that you've already done something wrong.
 	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
-	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
-	 */
-	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
 	{
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av,
-										   prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		/*
-		 * If the page isn't yet marked all-visible in the VM or it is and
-		 * needs to me marked all-frozen, update the VM Note that all_frozen
-		 * is only valid if all_visible is true, so we must check both
-		 * all_visible and all_frozen.
-		 */
-		else if (presult->all_visible &&
-				 (!blk_known_av ||
-				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			Assert(prstate.lpdead_items == 0);
-			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		Assert(cutoffs);
 
-			/*
-			 * If the page is all-frozen, we can pass InvalidTransactionId as
-			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
-			 * make everything safe for REDO was logged when the page's tuples
-			 * were frozen.
-			 */
-			if (presult->all_frozen)
-			{
-				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			}
+		Assert(prstate.lpdead_items == 0);
 
-			/*
-			 * It's possible for the VM bit to be clear and the page-level bit
-			 * to be set if checksums are not enabled.
-			 *
-			 * And even if we are just planning to update the frozen bit in
-			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
-			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
-			 * might have become stale.
-			 *
-			 * If the heap page is all-visible but the VM bit is not set, we
-			 * don't need to dirty the heap page.  However, if checksums are
-			 * enabled, we do need to make sure that the heap page is dirtied
-			 * before passing it to visibilitymap_set(), because it may be
-			 * logged.
-			 */
-			if (!PageIsAllVisible(page))
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buffer);
-			}
+		if (!heap_page_is_all_visible(relation, buffer,
+									  cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
 
-			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
-										   vmbuffer, presult->vm_conflict_horizon,
-										   vmflags);
-		}
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
+#endif
 
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->old_vmbits = old_vmbits;
+	/* new_vmbits was set above */
+	presult->hastup = prstate.hastup;
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = vmflags;
-
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
@@ -1624,7 +1681,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			break;
 	}
 
-	/* Consider freezing any normal tuples which will not be removed */
+	/*
+	 * Consider freezing any normal tuples which will not be removed.
+	 * Regardless of whether or not we want to freeze the tuples, if we want
+	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+	 * tuple to know whether or not the page will be totally frozen.
+	 */
 	if (prstate->freeze)
 	{
 		bool		totally_frozen;
@@ -2238,6 +2300,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
 	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
+	Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
 	xlrec.flags = 0;
 	regbuf_flags = REGBUF_STANDARD;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c49e81bc5dd..91e209901b8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2014,34 +2014,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2075,8 +2047,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate the status of the page as reflected
-	 * in the visibility map after pruning, freezing, and setting any pages
-	 * all-visible in the visibility map.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page
-	 * (older than OldestXmin).  It will only be valid if we did not set the
-	 * page all-frozen in the VM.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
 	uint8		old_vmbits;
 	uint8		new_vmbits;
 
-- 
2.43.0



  [application/x-patch] v4-0009-Update-VM-in-pruneheap.c.patch (12.7K, 8-v4-0009-Update-VM-in-pruneheap.c.patch)
  download | inline diff:
From 4aae350102d197fb511b45b478fe887e1900c3a7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v4 09/13] Update VM in pruneheap.c

As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
 src/backend/access/heap/pruneheap.c  | 99 +++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 98 +++++----------------------
 src/include/access/heapam.h          | 15 +++--
 3 files changed, 106 insertions(+), 106 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 070d64fa9c3..797b3710862 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -364,7 +364,8 @@ identify_and_fix_vm_corruption(Relation relation,
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -440,6 +441,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint;
+	uint8		vmflags = 0;
+	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -939,7 +942,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 *
 	 * Now that freezing has been finalized, unset all_visible if there are
 	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
+	 * of the page, as expected for updating the visibility map.
 	 */
 	if (prstate.all_visible && prstate.lpdead_items == 0)
 	{
@@ -955,31 +958,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->hastup = prstate.hastup;
 
 	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so the VM update record doesn't need it.
 	 */
 	if (presult->all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
 		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
 
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
 	/*
-	 * Clear any VM corruption. This does not need to be done in a critical
-	 * section.
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
 	 */
-	presult->vm_corruption = false;
 	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
-		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
-																blockno, buffer, page,
-																blk_known_av,
-																prstate.lpdead_items, vmbuffer);
+	{
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av,
+										   prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/*
+		 * If the page isn't yet marked all-visible in the VM or it is and
+		 * needs to me marked all-frozen, update the VM Note that all_frozen
+		 * is only valid if all_visible is true, so we must check both
+		 * all_visible and all_frozen.
+		 */
+		else if (presult->all_visible &&
+				 (!blk_known_av ||
+				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			Assert(prstate.lpdead_items == 0);
+			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			/*
+			 * If the page is all-frozen, we can pass InvalidTransactionId as
+			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+			 * make everything safe for REDO was logged when the page's tuples
+			 * were frozen.
+			 */
+			if (presult->all_frozen)
+			{
+				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			}
+
+			/*
+			 * It's possible for the VM bit to be clear and the page-level bit
+			 * to be set if checksums are not enabled.
+			 *
+			 * And even if we are just planning to update the frozen bit in
+			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
+			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+			 * might have become stale.
+			 *
+			 * If the heap page is all-visible but the VM bit is not set, we
+			 * don't need to dirty the heap page.  However, if checksums are
+			 * enabled, we do need to make sure that the heap page is dirtied
+			 * before passing it to visibilitymap_set(), because it may be
+			 * logged.
+			 */
+			if (!PageIsAllVisible(page))
+			{
+				PageSetAllVisible(page);
+				MarkBufferDirty(buffer);
+			}
+
+			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+										   vmbuffer, presult->vm_conflict_horizon,
+										   vmflags);
+		}
+	}
+
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
+
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = vmflags;
+
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6110b7f80ce..c49e81bc5dd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1933,7 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -1949,7 +1948,8 @@ cmpOffsetNumbers(const void *a, const void *b)
  * vmbuffer is the buffer containing the VM block with visibility information
  * for the heap block, blkno. all_visible_according_to_vm is the saved
  * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1978,6 +1978,7 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
+	 * Then, if the page's visibility status has changed, update the VM.
 	 *
 	 * If the relation has no indexes, we can immediately mark would-be dead
 	 * items LP_UNUSED.
@@ -1986,10 +1987,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * presult.ndeleted.  It should not be confused with presult.lpdead_items;
 	 * presult.lpdead_items's final value can be thought of as the number of
 	 * tuples that were deleted from indexes.
-	 *
-	 * We will update the VM after collecting LP_DEAD items and freezing
-	 * tuples. Pruning will have determined whether or not the page is
-	 * all-visible.
 	 */
 	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
@@ -2081,87 +2078,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (presult.vm_corruption)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		/* Don't update the VM if we just cleared corruption in it */
-	}
-
-	/*
-	 * If the page isn't yet marked all-visible in the VM or it is and needs
-	 * to me marked all-frozen, update the VM Note that all_frozen is only
-	 * valid if all_visible is true, so we must check both all_visible and
-	 * all_frozen.
-	 */
-	else if (presult.all_visible &&
-			 (!all_visible_according_to_vm ||
-			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		/*
-		 * If the page is all-frozen, we can pass InvalidTransactionId as our
-		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
-		 * everything safe for REDO was logged when the page's tuples were
-		 * frozen.
-		 */
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * If the heap page is all-visible but the VM bit is not set, we don't
-		 * need to dirty the heap page.  However, if checksums are enabled, we
-		 * do need to make sure that the heap page is dirtied before passing
-		 * it to visibilitymap_set(), because it may be logged.
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * Even if we are only setting the all-frozen bit, there is a small
-		 * chance that the VM was modified sometime between setting
-		 * all_visible_according_to_vm and checking the visibility during
-		 * pruning. Check the return value of old_vmbits to ensure the
-		 * visibility map counters used for logging are accurate.
-		 *
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
+	}
 
 	return presult.ndeleted;
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
+	 * all_visible and all_frozen indicate the status of the page as reflected
+	 * in the visibility map after pruning, freezing, and setting any pages
+	 * all-visible in the visibility map.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * vm_conflict_horizon is the newest xmin of live tuples on the page
+	 * (older than OldestXmin).  It will only be valid if we did not set the
+	 * page all-frozen in the VM.
 	 *
 	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
-	bool		vm_corruption;
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
-- 
2.43.0



  [application/x-patch] v4-0006-Combine-lazy_scan_prune-VM-corruption-cases.patch (7.1K, 9-v4-0006-Combine-lazy_scan_prune-VM-corruption-cases.patch)
  download | inline diff:
From 9acac2dcc61134502a305e055d2d5403c9c3d559 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v4 06/13] Combine lazy_scan_prune VM corruption cases

lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.

Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.

This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
 src/backend/access/heap/vacuumlazy.c | 114 +++++++++++++++++----------
 1 file changed, 73 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d9e195269d2..04a7b6c4181 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,6 +431,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1933,6 +1939,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
+
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2079,9 +2145,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * all_frozen variables. Start by looking for any VM corruption.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	{
+		/* Don't update the VM if we just cleared corruption in it */
+	}
+	else if (!all_visible_according_to_vm && presult.all_visible)
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2133,45 +2204,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-- 
2.43.0



  [application/x-patch] v4-0008-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch (11.7K, 10-v4-0008-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch)
  download | inline diff:
From 2c08444d6ef0f22a978aa1f8b099cee7517930f0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v4 08/13] Find and fix VM corruption in
 heap_page_prune_and_freeze

Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
 src/backend/access/heap/pruneheap.c  | 87 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 77 +++---------------------
 src/include/access/heapam.h          |  4 ++
 3 files changed, 96 insertions(+), 72 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2724cf7f64f..070d64fa9c3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
+
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+			heap_page_prune_and_freeze(relation, buffer, false,
+									   InvalidBuffer,
+									   vistest, 0,
 									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
  * (see heap_prune_satisfies_vacuum).
  *
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   bool blk_known_av,
+						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
 						   int options,
 						   struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	/*
+	 * Clear any VM corruption. This does not need to be done in a critical
+	 * section.
+	 */
+	presult->vm_corruption = false;
+	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+																blockno, buffer, page,
+																blk_known_av,
+																prstate.lpdead_items, vmbuffer);
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index da71f095da9..6110b7f80ce 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,12 +431,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
-										   BlockNumber heap_blk,
-										   Buffer heap_buffer, Page heap_page,
-										   bool heap_blk_known_av,
-										   int64 nlpdead_items,
-										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1939,65 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
-							   BlockNumber heap_blk,
-							   Buffer heap_buffer, Page heap_page,
-							   bool heap_blk_known_av,
-							   int64 nlpdead_items,
-							   Buffer vmbuffer)
-{
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
-		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	return false;
-}
-
 
 /* qsort comparator for sorting OffsetNumbers */
 static int
@@ -2056,11 +1991,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+	heap_page_prune_and_freeze(rel, buf,
+							   all_visible_according_to_vm,
+							   vmbuffer,
+							   vacrel->vistest, prune_options,
 							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2145,10 +2083,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables. Start by looking for any VM corruption.
+	 * all_frozen variables.
 	 */
-	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
-									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	if (presult.vm_corruption)
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
+	bool		vm_corruption;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   bool blk_known_av,
+									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
 									   int options,
 									   struct VacuumCutoffs *cutoffs,
-- 
2.43.0



  [application/x-patch] v4-0007-Combine-vacuum-phase-I-VM-update-cases.patch (5.6K, 11-v4-0007-Combine-vacuum-phase-I-VM-update-cases.patch)
  download | inline diff:
From 775d6c17d83095bac01b2e0b7e344d809b5ded7a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v4 07/13] Combine vacuum phase I VM update cases

After phase I of vacuum we update the VM, either setting the VM bits
when all bits are currently unset or setting just the frozen bit when
the all-visible bit is already set. Those cases had a lot of duplicated
code. Combine them. This is simpler to understand and also allows makes
the code compact enough to start using to update the VM while pruning
and freezing.
---
 src/backend/access/heap/vacuumlazy.c | 100 +++++++++------------------
 1 file changed, 31 insertions(+), 69 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 04a7b6c4181..da71f095da9 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2152,11 +2152,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
-	else if (!all_visible_according_to_vm && presult.all_visible)
+
+	/*
+	 * If the page isn't yet marked all-visible in the VM or it is and needs
+	 * to me marked all-frozen, update the VM Note that all_frozen is only
+	 * valid if all_visible is true, so we must check both all_visible and
+	 * all_frozen.
+	 */
+	else if (presult.all_visible &&
+			 (!all_visible_according_to_vm ||
+			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
+		/*
+		 * If the page is all-frozen, we can pass InvalidTransactionId as our
+		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+		 * everything safe for REDO was logged when the page's tuples were
+		 * frozen.
+		 */
 		if (presult.all_frozen)
 		{
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2169,21 +2184,28 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * If the heap page is all-visible but the VM bit is not set, we don't
+		 * need to dirty the heap page.  However, if checksums are enabled, we
+		 * do need to make sure that the heap page is dirtied before passing
+		 * it to visibilitymap_set(), because it may be logged.
 		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
+		if (!PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
 									   flags);
 
 		/*
+		 * Even if we are only setting the all-frozen bit, there is a small
+		 * chance that the VM was modified sometime between setting
+		 * all_visible_according_to_vm and checking the visibility during
+		 * pruning. Check the return value of old_vmbits to ensure the
+		 * visibility map counters used for logging are accurate.
+		 *
 		 * If the page wasn't already set all-visible and/or all-frozen in the
 		 * VM, count it as newly set for logging.
 		 */
@@ -2204,66 +2226,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [application/x-patch] v4-0011-Remove-xl_heap_visible-entirely.patch (22.3K, 12-v4-0011-Remove-xl_heap_visible-entirely.patch)
  download | inline diff:
From b554be605998123bb1e57edc6669147aa8f979a6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v4 11/13] Remove xl_heap_visible entirely

There are now no users of this, so eliminate it entirely.
---
 src/backend/access/common/bufmask.c      |   3 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 154 +----------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  10 +-
 src/backend/access/heap/visibilitymap.c  | 103 +--------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  11 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 12 files changed, 23 insertions(+), 357 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 68db4325285..48f7b84156a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
 #include "access/valid.h"
 #include "access/visibilitymap.h"
 #include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
@@ -2512,11 +2513,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			visibilitymap_set_vmbyte(relation,
-									 BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set(relation,
+							  BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN);
 		}
 
 		/*
@@ -8784,49 +8785,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
 
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 14541e2e94f..64f06d46bf1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -82,10 +82,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		memcpy(&vmflags, maindataptr, sizeof(uint8));
 		maindataptr += sizeof(uint8);
 
-		/*
-		 * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
-		 * because we already have XLHP_IS_CATALOG_REL.
-		 */
 		Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
 		/* Must never set all_frozen bit without also setting all_visible bit */
 		Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
@@ -267,7 +263,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+		old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
 			PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -278,143 +274,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		UnlockReleaseBuffer(vmbuffer);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-		visibilitymap_pin(reln, blkno, &vmbuffer);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -791,16 +650,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		visibilitymap_set_vmbyte(reln, blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN);
 
 		/*
 		 * It is not possible that the VM was already set for this heap page,
 		 * so the vmbuffer must have been modified and marked dirty.
 		 */
 		Assert(BufferIsDirty(vmbuffer));
+		visibilitymap_set(reln, blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
 		PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		FreeFakeRelcacheEntry(reln);
 	}
@@ -1380,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6208f55176f..f6509695e3a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -959,8 +959,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_set_vm)
 		{
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
-												  vmbuffer, vmflags);
+			old_vmbits = visibilitymap_set(relation, blockno,
+										   vmbuffer, vmflags);
 
 			if (old_vmbits == vmflags)
 			{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 91e209901b8..6a0fa371a06 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1887,8 +1887,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 			MarkBufferDirty(buf);
 
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			visibilitymap_set_vmbyte(vacrel->rel, blkno,
-									 vmbuffer, new_vmbits);
+			visibilitymap_set(vacrel->rel, blkno,
+							  vmbuffer, new_vmbits);
 
 			if (RelationNeedsWAL(vacrel->rel))
 			{
@@ -2754,9 +2754,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		set_pd_all_vis = true;
 		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbyte(vacrel->rel,
-								 blkno,
-								 vmbuffer, vmflags);
+		visibilitymap_set(vacrel->rel,
+						  blkno,
+						  vmbuffer, vmflags);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 573df6f6891..478b08fa520 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -219,105 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set flags in the VM block contained in the passed in vmBuf.
  *
@@ -337,8 +238,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * making any changes needed to the associated heap page.
  */
 uint8
-visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index d6c86ccac20..f7880a4ed81 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -351,13 +351,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -462,9 +455,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ceae9c083ff..a64677b7bca 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -438,20 +437,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -495,11 +480,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 977566f6b98..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 83192038571..e65094cb5df 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4269,7 +4269,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [application/x-patch] v4-0012-Allow-on-access-pruning-to-set-pages-all-visible.patch (29.4K, 13-v4-0012-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From e60710ccff532c6f6da9c470edc6eab9ecdbc37c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 7 Jul 2025 17:30:14 -0400
Subject: [PATCH v4 12/13] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c          | 15 +++++-
 src/backend/access/heap/heapam_handler.c  | 17 ++++++-
 src/backend/access/heap/pruneheap.c       | 59 +++++++++++++++++------
 src/backend/access/index/indexam.c        | 46 ++++++++++++++++++
 src/backend/access/table/tableam.c        | 39 +++++++++++++--
 src/backend/executor/execMain.c           |  4 ++
 src/backend/executor/execUtils.c          |  2 +
 src/backend/executor/nodeBitmapHeapscan.c |  6 ++-
 src/backend/executor/nodeIndexscan.c      | 17 ++++---
 src/backend/executor/nodeSeqscan.c        | 17 +++++--
 src/backend/storage/ipc/procarray.c       | 12 +++++
 src/include/access/genam.h                | 11 +++++
 src/include/access/heapam.h               | 24 +++++++--
 src/include/access/relscan.h              |  6 +++
 src/include/access/tableam.h              | 30 +++++++++++-
 src/include/nodes/execnodes.h             | 17 +++++++
 src/include/utils/snapmgr.h               |  1 +
 17 files changed, 285 insertions(+), 38 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 48f7b84156a..50b0d169d54 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -560,6 +560,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	int			lines;
 	bool		all_visible;
 	bool		check_serializable;
+	bool		allow_vmset;
 
 	Assert(BufferGetBlockNumber(buffer) == block);
 
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	allow_vmset = sscan->rs_flags & SO_ALLOW_VM_SET;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer,
+						allow_vmset ? &scan->rs_vmbuffer : NULL, allow_vmset);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1236,6 +1239,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1274,6 +1278,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1306,6 +1316,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cb4bc35c93e..fb450c5a84f 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,9 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer,
+								!scan->modifies_base_rel);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	bool		allow_vmset = false;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2527,10 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	allow_vmset = scan->rs_flags & SO_ALLOW_VM_SET;
+	heap_page_prune_opt(scan->rs_rd, buffer,
+						allow_vmset ? &hscan->rs_vmbuffer : NULL,
+						allow_vmset);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f6509695e3a..af23008ddf7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -158,6 +158,7 @@ typedef struct
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId visibility_cutoff_xid;
+	TransactionId oldest_xmin;
 } PruneState;
 
 /* Local functions */
@@ -203,9 +204,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If allow_vmset is true, it is okay for pruning to set the visibility map if
+ * the page is all visible.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer,
+					Buffer *vmbuffer, bool allow_vmset)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -260,6 +265,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		if (!ConditionalLockBufferForCleanup(buffer))
 			return;
 
+		/* Caller should not pass a vmbuffer if allow_vmset is false. */
+		Assert(allow_vmset || vmbuffer == NULL);
+
 		/*
 		 * Now that we have buffer lock, get accurate information about the
 		 * page's free space, and recheck the heuristic about whether to
@@ -269,6 +277,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			int			options = 0;
+
+			if (allow_vmset)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				options = HEAP_PAGE_PRUNE_UPDATE_VM;
+			}
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -276,8 +291,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * that during on-access pruning with the current implementation.
 			 */
 			heap_page_prune_and_freeze(relation, buffer, false,
-									   InvalidBuffer,
-									   vistest, 0,
+									   vmbuffer ? *vmbuffer : InvalidBuffer,
+									   vistest, options,
 									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -467,6 +482,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
 	prstate.update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
+	if (cutoffs)
+		prstate.oldest_xmin = cutoffs->OldestXmin;
+	else
+		prstate.oldest_xmin = OldestXminFromGlobalVisState(vistest);
 	prstate.cutoffs = cutoffs;
 
 	/*
@@ -877,6 +896,20 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 */
 	if (prstate.update_vm)
 	{
+		/*
+		 * If this is on-access and we aren't actually pruning, don't set the
+		 * VM if doing so would newly dirty the heap page or, if the page is
+		 * already dirty, if the WAL record emitted would have to contain an
+		 * FPI of the heap page. This should rarely happen, as we only attempt
+		 * on-access pruning when pd_prune_xid is valid.
+		 */
+		if (reason == PRUNE_ON_ACCESS &&
+			!do_prune && !do_freeze &&
+			(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+		{
+			/* Don't update the VM */
+		}
+
 		/*
 		 * Clear any VM corruption. This does not need to be in a critical
 		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
@@ -885,9 +918,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 * of VM corruption, so we don't have to worry about the extra
 		 * performance overhead.
 		 */
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av, prstate.lpdead_items, vmbuffer))
+		else if (identify_and_fix_vm_corruption(relation,
+												blockno, buffer, page,
+												blk_known_av, prstate.lpdead_items, vmbuffer))
 		{
 			/* If we fix corruption, don't update the VM further */
 		}
@@ -1013,7 +1046,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 */
 			else if (do_freeze)
 			{
-				conflict_xid = prstate.cutoffs->OldestXmin;
+				conflict_xid = prstate.oldest_xmin;
 				TransactionIdRetreat(conflict_xid);
 			}
 
@@ -1071,12 +1104,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(cutoffs);
-
 		Assert(prstate.lpdead_items == 0);
 
 		if (!heap_page_is_all_visible(relation, buffer,
-									  cutoffs->OldestXmin,
+									  prstate.oldest_xmin,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1136,9 +1167,8 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * vacuuming the relation. OldestXmin is used for freezing determination
 	 * and we cannot freeze dead tuples' xmaxes.
 	 */
-	if (prstate->cutoffs &&
-		TransactionIdIsValid(prstate->cutoffs->OldestXmin) &&
-		NormalTransactionIdPrecedes(dead_after, prstate->cutoffs->OldestXmin))
+	if (TransactionIdIsValid(prstate->oldest_xmin) &&
+		NormalTransactionIdPrecedes(dead_after, prstate->oldest_xmin))
 		return HEAPTUPLE_DEAD;
 
 	/*
@@ -1607,8 +1637,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 * could use GlobalVisTestIsRemovableXid instead, if a
 				 * non-freezing caller wanted to set the VM bit.
 				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
+				if (!TransactionIdPrecedes(xmin, prstate->oldest_xmin))
 				{
 					prstate->all_visible = false;
 					break;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 219df1971da..d803c307517 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -279,6 +279,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -610,6 +636,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0391798dd2c..065676eb7cf 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -917,6 +917,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..2c57bc7ac49 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -109,7 +109,8 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   node->modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
@@ -360,6 +361,9 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 	scanstate->initialized = false;
 	scanstate->pstate = NULL;
 	scanstate->recheck = true;
+	scanstate->modifies_rel =
+		bms_is_member(node->scan.scanrelid,
+					  estate->es_modified_relids);
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..f91c6b17620 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -106,12 +106,13 @@ IndexNext(IndexScanState *node)
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 node->iss_ModifiesBaseRel);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -935,6 +936,10 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	indexstate->ss.ss_currentRelation = currentRelation;
 	indexstate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
 
+	indexstate->iss_ModifiesBaseRel =
+		bms_is_member(node->scan.scanrelid,
+					  estate->es_modified_relids);
+
 	/*
 	 * get the scan type from the relation descriptor.
 	 */
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index ed35c58c2c3..cded7f15703 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -69,9 +69,9 @@ SeqNext(SeqScanState *node)
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, node->modifies_rel);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -237,6 +237,10 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 							 node->scan.scanrelid,
 							 eflags);
 
+	scanstate->modifies_rel =
+		bms_is_member(node->scan.scanrelid,
+					  estate->es_modified_relids);
+
 	/* and create slot with the appropriate rowtype */
 	ExecInitScanTupleSlot(estate, &scanstate->ss,
 						  RelationGetDescr(scanstate->ss.ss_currentRelation),
@@ -370,7 +374,8 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   node->modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -403,5 +408,7 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   node->modifies_rel);
 }
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index e5b945a9ee3..01d2bda3f72 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4133,6 +4133,18 @@ GlobalVisTestFor(Relation rel)
 	return state;
 }
 
+/*
+ * Returns maybe_needed as a 32-bit TransactionId. Can be used in callers that
+ * need to compare transaction IDs to a single value and are okay with using
+ * the more conservative boundary.
+ */
+TransactionId
+OldestXminFromGlobalVisState(GlobalVisState *state)
+{
+	return XidFromFullTransactionId(state->maybe_needed);
+}
+
+
 /*
  * Return true if it's worth updating the accurate maybe_needed boundary.
  *
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..46ea8b8455c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer, bool allow_vmset);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..0e986d8ef72 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -876,6 +878,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -913,10 +934,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, struct ScanKeyData *key)
+				   int nkeys, struct ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1125,6 +1149,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e107d6e5f81..1d0b374b652 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -680,6 +680,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
@@ -1631,6 +1637,13 @@ typedef struct SeqScanState
 {
 	ScanState	ss;				/* its first field is NodeTag */
 	Size		pscan_len;		/* size of parallel heap scan descriptor */
+
+	/*
+	 * Whether or not the query modifies the relation scanned by this node.
+	 * This is used to avoid the overhead of optimizations that are only
+	 * effective for tables not modified by the query.
+	 */
+	bool		modifies_rel;
 } SeqScanState;
 
 /* ----------------
@@ -1702,6 +1715,7 @@ typedef struct
  *		OrderByTypByVals   is the datatype of order by expression pass-by-value?
  *		OrderByTypLens	   typlens of the datatypes of order by expressions
  *		PscanLen		   size of parallel index scan descriptor
+ *		ModifiesBaseRel    true if query modifies base relation
  * ----------------
  */
 typedef struct IndexScanState
@@ -1731,6 +1745,7 @@ typedef struct IndexScanState
 	bool	   *iss_OrderByTypByVals;
 	int16	   *iss_OrderByTypLens;
 	Size		iss_PscanLen;
+	bool		iss_ModifiesBaseRel;
 } IndexScanState;
 
 /* ----------------
@@ -1888,6 +1903,7 @@ typedef struct SharedBitmapHeapInstrumentation
  *		pstate			   shared state for parallel bitmap scan
  *		sinstrument		   statistics for parallel workers
  *		recheck			   do current page's tuples need recheck
+ *		modifies_rel	   does the query modify the base relation
  * ----------------
  */
 typedef struct BitmapHeapScanState
@@ -1900,6 +1916,7 @@ typedef struct BitmapHeapScanState
 	ParallelBitmapHeapState *pstate;
 	SharedBitmapHeapInstrumentation *sinstrument;
 	bool		recheck;
+	bool		modifies_rel;
 } BitmapHeapScanState;
 
 /* ----------------
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index d346be71642..fcb10b8d136 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -101,6 +101,7 @@ extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid
 extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
+extern TransactionId OldestXminFromGlobalVisState(GlobalVisState *state);
 
 /*
  * Utility functions for implementing visibility routines in table AMs.
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-07-13 18:34  Andrey Borodin <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Andrey Borodin @ 2025-07-13 18:34 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>



> On 12 Jul 2025, at 03:19, Melanie Plageman <[email protected]> wrote:
> 
> remove the xl_heap_visible struct

Same goes for VISIBILITYMAP_XLOG_CATALOG_REL and XLOG_HEAP2_VISIBLE. But please do not rush to remove it, perhaps I will have a more exhaustive list later. Currently the patch set is expected to be unpolished.
I just need to absorb all effects to have a high-level evaluation of the patch set effect.

I'm still trying to grasp connection of first patch with Assert(prstate->cutoffs) to other patches;

Also, I'd prefer "page is not marked all-visible but visibility map bit is set in relation" to emit XX001 for monitoring reasons, but again, this is small note, while I need a broader picture.

So far I do not see any general problems in delegating redo work from xl_heap_visible to other record. FWIW I observed several cases of VM corruptions that might be connected to the fact that we log VM changes independently of data changes that caused VM to change. But I have no real evidence or understanding what happened.


Best regards, Andrey Borodin.




^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-07-13 19:15  Melanie Plageman <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-07-13 19:15 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>

On Sun, Jul 13, 2025 at 2:34 PM Andrey Borodin <[email protected]> wrote:
>
> > On 12 Jul 2025, at 03:19, Melanie Plageman <[email protected]> wrote:
> >
> > remove the xl_heap_visible struct
>
> Same goes for VISIBILITYMAP_XLOG_CATALOG_REL and XLOG_HEAP2_VISIBLE. But please do not rush to remove it, perhaps I will have a more exhaustive list later. Currently the patch set is expected to be unpolished.
> I just need to absorb all effects to have a high-level evaluation of the patch set effect.

I actually did remove those if you check the last version posted. I
did notice there is one remaining comment referring to
XLOG_HEAP2_VISIBLE I missed somehow, but the actual enums/macros were
removed already.

> I'm still trying to grasp connection of first patch with Assert(prstate->cutoffs) to other patches;

I added this because I noticed that it was used without validating it
was provided in that location. The last patch in the set which sets
the VM on access changes where cutoffs are used, so I noticed what I
felt was a missing assert in master while developing that page.

> Also, I'd prefer "page is not marked all-visible but visibility map bit is set in relation" to emit XX001 for monitoring reasons, but again, this is small note, while I need a broader picture.

Could you clarify what you mean by this? Are you talking about the
string representation of the visibility map bits in the WAL record
representations in heapdesc.c?

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-07-14 06:37  Andrey Borodin <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Andrey Borodin @ 2025-07-14 06:37 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>



> On 14 Jul 2025, at 00:15, Melanie Plageman <[email protected]> wrote:
> 
>> 
>> Also, I'd prefer "page is not marked all-visible but visibility map bit is set in relation" to emit XX001 for monitoring reasons, but again, this is small note, while I need a broader picture.
> 
> Could you clarify what you mean by this? Are you talking about the
> string representation of the visibility map bits in the WAL record
> representations in heapdesc.c?

This might be a bit off-topic for this thread, but as long as the patch touches that code we can look into this too.

If VM bit all-visible is set while page is not all-visible IndexOnlyScan will show incorrect results. I observed this inconsistency few times on production.

Two persistent subsystems (VM and heap) contradict each other, that's why I think this is a data corruption. Yes, we can repair the VM by assuming heap to be the source of truth in this case. But we must also emit ERRCODE_DATA_CORRUPTED XX001 code into the logs. In many cases this will alert on-call SRE.

To do so I propose to replace elog(WARNING,...) with ereport(WARNING,(errcode(ERRCODE_DATA_CORRUPTED),..).


Best regards, Andrey Borodin.




^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-07-31 22:58  Melanie Plageman <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-07-31 22:58 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>

Thanks for continuing to take a look, Andrey.

On Mon, Jul 14, 2025 at 2:37 AM Andrey Borodin <[email protected]> wrote:
>
> This might be a bit off-topic for this thread, but as long as the patch touches that code we can look into this too.
>
> If VM bit all-visible is set while page is not all-visible IndexOnlyScan will show incorrect results. I observed this inconsistency few times on production.

That's very unfortunate. I wonder what could be causing this. Do you
suspect a bug in Postgres? Or something wrong with the disk, etc?

> Two persistent subsystems (VM and heap) contradict each other, that's why I think this is a data corruption. Yes, we can repair the VM by assuming heap to be the source of truth in this case. But we must also emit ERRCODE_DATA_CORRUPTED XX001 code into the logs. In many cases this will alert on-call SRE.
>
> To do so I propose to replace elog(WARNING,...) with ereport(WARNING,(errcode(ERRCODE_DATA_CORRUPTED),..).

Ah, you mean the warnings currently in lazy_scan_prune(). To me this
suggestion makes sense. I see at least one other example with
ERRCODE_DATA_CORRUPTED that is an error level below ERROR.

I have attached a cleaned up and updated version of the patch set (it
doesn't yet include your suggested error message change).


What's new in this version
-----
In addition to general code, comment, and commit message improvements,
notable changes are as follows:

- I have used the GlobalVisState for determining if the whole page is
visible in a more natural way.

- I micro-benchmarked and identified some sources of regression in the
additional code SELECT queries would do to set the VM. So, there are
several new commits addressing these (for example inlining several
functions and unsetting all-visible when we see a dead tuple if we
won't attempt freezing).

- Because heap_page_prune_and_freeze() was getting long, I added some
helper functions.


Performance impact of setting the VM on-access
-------
I found that with the patch set applied, we set many pages all-visible
in the VM on access, resulting in a higher overall number of pages set
all-visible, reducing load for vacuum, and dramatically decreasing
heap fetches by index-only scans.

I devised a simple benchmark -- with 8 workers inserting 20 rows at a
time into a table with a few columns and updating a single row that
they just inserted. Another worker queries the table 1x second using
an index.

After running the benchmark for a few minutes, though the table was
autovacuumed several times in both cases, with the patchset applied,
15% more blocks were all-visible at the end of the benchmark.

And with my patch applied, index-only scans did far fewer heap
fetches. A SELECT count(*) of the table at the same point in the
benchmark did 10,000 heap fetches on master and 500 with the patch
applied (I used auto_explain to determine this).

With my patch applied, autovacuum workers write half as much WAL as on
master. Some of this is courtesy of other patches in the set which
eliminate separate WAL records for setting the page all-visible. But,
vacuum is also scanning fewer pages and dirtying fewer buffers because
they are being set all-visible on-access.

There are more details about the benchmark at the end of the email.


Setting pd_prune_xid on insert
------
The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
patch in the set. It sets pd_prune_xid on insert (so pages filled by
COPY or insert can also be set all-visible in the VM before they are
vacuumed). I gave it a .txt extension because it currently fails
035_standby_logical_decoding due to a recovery conflict. I need to
investigate more to see if this is a bug in my patch set or elsewhere
in Postgres.

Besides the failing test, I have a feeling that my current heuristic
for whether or not to set the VM on-access is not quite right for
pages that have only been inserted to -- and if we get it wrong, we've
wasted those CPU cycles because we didn't otherwise need to prune the
page.


- Melanie


Benchmark
-------
psql -c "
DROP TABLE IF EXISTS simple_table;

CREATE TABLE simple_table (
    id SERIAL PRIMARY KEY,
    group_id INT NOT NULL,
    data TEXT,
    created_at TIMESTAMPTZ DEFAULT now()
);

create index on simple_table(group_id);
"

pgbench \
  --no-vacuum \
  --random-seed=0 \
  -c 8 \
  -j 8 \
  -M prepared \
  -T 200 \
  > "pgbench_run_summary_update_${version}" \
-f- <<EOF &
\set gid random(1,1000)

INSERT INTO simple_table (group_id, data)
  SELECT :gid, 'inserted'
  RETURNING id \gset

update simple_table set data = 'updated' where id = :id;

insert into simple_table (group_id, data)
  select :gid, 'inserted'
  from generate_series(1,20);
EOF
insert_pid=$!

pgbench \
  --no-vacuum \
  --random-seed=0 \
  -c 1 \
  -j 1 \
  --rate=1 \
  -M prepared \
  -T 200 \
  > "pgbench_run_summary_select_${version}" \
-f- <<EOF
\set gid random(1, 1000)
select max(created_at) from simple_table where group_id = :gid;
select count(*) from simple_table where group_id = :gid;
EOF

wait $insert_pid

From 058df21a6da05956bbf3a0a45db575d83a515002 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v5 20/20] Set pd_prune_xid on insert

Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.

For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.

ci-os-only:
---
 src/backend/access/heap/heapam.c      | 25 +++++++++++++++++--------
 src/backend/access/heap/heapam_xlog.c | 15 ++++++++++++++-
 2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f90b014a9b0..e0f2245052c 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2094,6 +2094,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2153,15 +2154,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2171,7 +2176,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2534,8 +2538,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 64f06d46bf1..234e9a401b9 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -473,6 +473,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -622,9 +628,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
-- 
2.43.0



Attachments:

  [text/plain] Set-pd_prune_xid-on-insert.txt (4.5K, 2-Set-pd_prune_xid-on-insert.txt)
  download | inline diff:
From 058df21a6da05956bbf3a0a45db575d83a515002 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v5 20/20] Set pd_prune_xid on insert

Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.

For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.

ci-os-only:
---
 src/backend/access/heap/heapam.c      | 25 +++++++++++++++++--------
 src/backend/access/heap/heapam_xlog.c | 15 ++++++++++++++-
 2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f90b014a9b0..e0f2245052c 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2094,6 +2094,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2153,15 +2154,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2171,7 +2176,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2534,8 +2538,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 64f06d46bf1..234e9a401b9 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -473,6 +473,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -622,9 +628,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
-- 
2.43.0



  [text/x-patch] v5-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch (24.4K, 3-v5-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch)
  download | inline diff:
From 053a650299b860242664accc703f46b711807901 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v5 03/20] Eliminate xl_heap_visible from vacuum phase III

Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.
---
 src/backend/access/heap/heapam_xlog.c  | 142 ++++++++++++++++++++---
 src/backend/access/heap/pruneheap.c    |  48 +++++++-
 src/backend/access/heap/vacuumlazy.c   | 149 +++++++++++++++++--------
 src/backend/access/rmgrdesc/heapdesc.c |  13 ++-
 src/include/access/heapam.h            |   9 ++
 src/include/access/heapam_xlog.h       |   3 +
 6 files changed, 296 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2485c344191..14541e2e94f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. If pruning, that
+	 * means we cannot remove tuples still visible to transactions on the
+	 * standby. If freezing, that means we cannot freeze tuples with xids that
+	 * are still considered running on the standby. And for setting the VM, we
+	 * cannot do so if the page isn't all-visible to all transactions on the
+	 * standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -70,13 +76,28 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 												rlocator);
 	}
 
+	/* Next are the optionally included vmflags. Copy them out for later use. */
+	if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+	{
+		memcpy(&vmflags, maindataptr, sizeof(uint8));
+		maindataptr += sizeof(uint8);
+
+		/*
+		 * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
+		 * because we already have XLHP_IS_CATALOG_REL.
+		 */
+		Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+		/* Must never set all_frozen bit without also setting all_visible bit */
+		Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
+	}
+
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = (Page) BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -89,6 +110,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Size		datalen;
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
+		bool		do_prune;
+		bool		mark_buffer_dirty;
+		bool		set_heap_lsn;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +121,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+		set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 ||
+			   vmflags & VISIBILITYMAP_VALID_BITS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,26 +169,72 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		Assert(BufferIsValid(buffer) &&
+			   BufferGetBlockNumber(buffer) == blkno);
+
+		/*
+		 * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+		 * also going to set bits in the VM later.
+		 *
+		 * We must never end up with the VM bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+		 * modification would fail to clear the VM bit.
+		 */
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+
+			/*
+			 * If the only change to the heap page is setting PD_ALL_VISIBLE,
+			 * we can avoid setting the page LSN unless checksums or
+			 * wal_log_hints are enabled.
+			 */
+			set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+			mark_buffer_dirty = true;
+		}
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
 
-		PageSetLSN(page, lsn);
-		MarkBufferDirty(buffer);
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+		if (set_heap_lsn)
+			PageSetLSN(page, lsn);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or will be setting a page in
+	 * the visibility map, update the free space map.
+	 *
+	 * Even if we are just updating the VM (and thus not freeing up any
+	 * space), we'll still update the FSM for this page. Since FSM is not
+	 * WAL-logged and only updated heuristically, it easily becomes stale in
+	 * standbys.  If the standby is later promoted and runs VACUUM, it will
+	 * skip updating individual free space figures for pages that became
+	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
+	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+	 * space values to upper FSM layers; later inserters try to use such pages
+	 * only to find out that they are unusable.  This can cause long stalls
+	 * when there are many such pages.
+	 *
+	 * Forestall those problems by updating FSM's idea about a page that is
+	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
+	 *
+	 * We want to avoid holding an exclusive lock on the heap buffer while
+	 * doing IO (either of the FSM or the VM), so we'll release the lock on
+	 * the heap buffer before doing either.
 	 */
 	if (BufferIsValid(buffer))
 	{
-		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
-						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
+		if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+							XLHP_HAS_DEAD_ITEMS |
+							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+			vmflags & VISIBILITYMAP_VALID_BITS)
 		{
 			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
 
@@ -168,6 +245,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		else
 			UnlockReleaseBuffer(buffer);
 	}
+
+	/*
+	 * Read and update the VM block. Even if we skipped updating the heap page
+	 * due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * Note that it is *only* okay that we do not hold a lock on the heap page
+	 * because we are in recovery and can expect no other writers to clear
+	 * PD_ALL_VISIBLE before we are able to update the VM.
+	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		uint8		old_vmbits = 0;
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		visibilitymap_pin(reln, blkno, &vmbuffer);
+		old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a8025889be0..d9ba0f96e34 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -2045,12 +2047,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer,
+						  uint8 vmflags,
+						  bool set_pd_all_vis,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
 						  PruneReason reason,
@@ -2062,6 +2075,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2084,19 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
 	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2136,6 +2165,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Prepare the main xl_heap_prune record.  We already set the XLHP_HAS_*
 	 * flag above.
 	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		xlrec.flags |= XLHP_HAS_VMFLAGS;
 	if (RelationIsAccessibleInLogicalDecoding(relation))
 		xlrec.flags |= XLHP_IS_CATALOG_REL;
 	if (TransactionIdIsValid(conflict_xid))
@@ -2150,6 +2181,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	XLogRegisterData(&xlrec, SizeOfHeapPrune);
 	if (TransactionIdIsValid(conflict_xid))
 		XLogRegisterData(&conflict_xid, sizeof(TransactionId));
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterData(&vmflags, sizeof(uint8));
 
 	switch (reason)
 	{
@@ -2168,5 +2201,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+
+	/*
+	 * If pruning or freezing tuples or setting the page all-visible when
+	 * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+	 * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+	 * but this is deemed okay for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+		PageSetLSN(BufferGetPage(buffer), recptr);
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a62a93eee5..460cdbd8417 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,11 +464,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
+static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+												   TransactionId OldestXmin,
+												   OffsetNumber *deadoffsets,
+												   int allowed_num_offsets,
+												   bool *all_frozen,
+												   TransactionId *visibility_cutoff_xid,
+												   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2847,8 +2849,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
+	bool		set_pd_all_vis = false;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2859,6 +2864,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
+											   vacrel->cutoffs.OldestXmin,
+											   deadoffsets, num_offsets,
+											   &all_frozen, &visibility_cutoff_xid,
+											   &vacrel->offnum))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2878,6 +2897,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
+		Assert(!PageIsAllVisible(page));
+		set_pd_all_vis = true;
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		PageSetAllVisible(page);
+		visibilitymap_set_vmbyte(vacrel->rel,
+								 blkno,
+								 vmbuffer, vmflags);
+		conflict_xid = visibility_cutoff_xid;
+	}
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2887,7 +2918,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  vmbuffer,
+								  vmflags,
+								  set_pd_all_vis,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
@@ -2896,39 +2930,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
-	 */
 	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
-	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
-								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  flags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
@@ -3594,6 +3601,25 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
+/*
+ * Wrapper for heap_page_is_all_visible_except_lpdead() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+												  NULL, 0,
+												  all_frozen,
+												  visibility_cutoff_xid,
+												  logging_offnum);
+}
+
 /*
  * Check if every tuple in the given page is visible to all current and future
  * transactions.
@@ -3607,23 +3633,35 @@ dead_items_cleanup(LVRelState *vacrel)
  * visible tuples. Sets *all_frozen to true if every tuple on this page is
  * frozen.
  *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * deadoffsets are the offsets we know about and are about to set LP_UNUSED.
+ * allowed_num_offsets is the number of those. As long as the LP_DEAD items we
+ * encounter on the page match those exactly, we can set the page all-visible
+ * in the VM.
+ *
+ * Callers looking to verify that the page is all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
+heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+									   TransactionId OldestXmin,
+									   OffsetNumber *deadoffsets,
+									   int allowed_num_offsets,
+									   bool *all_frozen,
+									   TransactionId *visibility_cutoff_xid,
+									   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	OffsetNumber current_dead_offsets[MaxHeapTuplesPerPage];
+	size_t		current_num_offsets = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
@@ -3655,9 +3693,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			current_dead_offsets[current_num_offsets++] = offnum;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
@@ -3724,7 +3761,23 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
-	return all_visible;
+	/* If we already know it's not all-visible, return false */
+	if (!all_visible)
+		return false;
+
+	/* If we weren't allowed any dead offsets, we're done */
+	if (allowed_num_offsets == 0)
+		return current_num_offsets == 0;
+
+	/* If the number of dead offsets has changed, that's wrong */
+	if (current_num_offsets != allowed_num_offsets)
+		return false;
+
+	Assert(deadoffsets);
+
+	/* The dead offsets must be the same dead offsets */
+	return memcmp(current_dead_offsets, deadoffsets,
+				  allowed_num_offsets * sizeof(OffsetNumber)) == 0;
 }
 
 /*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..d6c86ccac20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -266,6 +266,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 {
 	char	   *rec = XLogRecGetData(record);
 	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+	char	   *maindataptr = rec + SizeOfHeapPrune;
 
 	info &= XLOG_HEAP_OPMASK;
 	if (info == XLOG_HEAP2_PRUNE_ON_ACCESS ||
@@ -278,7 +279,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		{
 			TransactionId conflict_xid;
 
-			memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
+			memcpy(&conflict_xid, maindataptr, sizeof(TransactionId));
+			maindataptr += sizeof(TransactionId);
 
 			appendStringInfo(buf, "snapshotConflictHorizon: %u",
 							 conflict_xid);
@@ -287,6 +289,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & XLHP_HAS_VMFLAGS)
+		{
+			uint8		vmflags;
+
+			memcpy(&vmflags, maindataptr, sizeof(uint8));
+			maindataptr += sizeof(uint8);
+			appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+		}
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
 										   Buffer buffer);
 extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer,
+									  uint8 vmflags,
+									  bool vm_modified_heap_page,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
 									  PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..ceae9c083ff 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -295,6 +295,9 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
 
+/* If the record should update the VM, this is the new value */
+#define		XLHP_HAS_VMFLAGS			(1 << 0)
+
 /* to handle recovery conflict during logical decoding on standby */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
-- 
2.43.0



  [text/x-patch] v5-0004-Use-xl_heap_prune-record-for-setting-empty-pages-.patch (5.8K, 4-v5-0004-Use-xl_heap_prune-record-for-setting-empty-pages-.patch)
  download | inline diff:
From 291de3c976a1312b86156d3e4e984eb66808b9b8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v5 04/20] Use xl_heap_prune record for setting empty pages
 all-visible

As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
 src/backend/access/heap/pruneheap.c  | 14 +++++--
 src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++----------
 src/include/access/heapam.h          |  1 +
 3 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d9ba0f96e34..97e51f78854 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  false,
 									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
@@ -2051,6 +2052,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
  * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
  * the page LSN when checksums/wal_log_hints are enabled even if we did not
  * prune or freeze tuples on the page.
@@ -2061,6 +2065,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  bool force_heap_fpi,
 						  Buffer vmbuffer,
 						  uint8 vmflags,
 						  bool set_pd_all_vis,
@@ -2089,13 +2094,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlrec.flags = 0;
 	regbuf_flags = REGBUF_STANDARD;
 
+	if (force_heap_fpi)
+		regbuf_flags |= REGBUF_FORCE_IMAGE;
+
 	/*
 	 * We can avoid an FPI if the only modification we are making to the heap
 	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+	else if (!do_prune &&
+			 nfrozen == 0 &&
+			 (!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 460cdbd8417..d9e195269d2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1878,33 +1878,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+				VISIBILITYMAP_ALL_FROZEN;
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
-			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				PageGetLSN(page) == InvalidXLogRecPtr)
-				log_newpage_buffer(buf, true);
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			visibilitymap_set_vmbyte(vacrel->rel, blkno,
+									 vmbuffer, new_vmbits);
+
+			if (RelationNeedsWAL(vacrel->rel))
+			{
+				/*
+				 * It's possible that another backend has extended the heap,
+				 * initialized the page, and then failed to WAL-log the page
+				 * due to an ERROR.  Since heap extension is not WAL-logged,
+				 * recovery might try to replay our record setting the page
+				 * all-visible and find that the page isn't initialized, which
+				 * will cause a PANIC. To prevent that, if the page hasn't
+				 * been previously WAL-logged, force a heap FPI.
+				 */
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  PageGetLSN(page) == InvalidXLogRecPtr,
+										  vmbuffer,
+										  new_vmbits,
+										  true,
+										  InvalidTransactionId,
+										  false, PRUNE_VACUUM_SCAN,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+			}
 
-			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
-			/* Count the newly all-frozen pages for logging */
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+			/* Count the newly all-frozen pages for logging. */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
 		}
@@ -2918,6 +2932,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
+								  false,
 								  vmbuffer,
 								  vmflags,
 								  set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  bool force_heap_fpi,
 									  Buffer vmbuffer,
 									  uint8 vmflags,
 									  bool vm_modified_heap_page,
-- 
2.43.0



  [text/x-patch] v5-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch (10.8K, 5-v5-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch)
  download | inline diff:
From f98373090c6281d0278bfc7ffd407bad274c302d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v5 01/20] Eliminate xl_heap_visible in COPY FREEZE

Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.
---
 src/backend/access/heap/heapam.c        | 47 ++++++++++---------
 src/backend/access/heap/heapam_xlog.c   | 39 +++++++++++++++-
 src/backend/access/heap/visibilitymap.c | 62 ++++++++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c  |  5 ++
 src/include/access/visibilitymap.h      |  2 +
 5 files changed, 132 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..68db4325285 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2493,9 +2493,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * If the page is all visible, need to clear that, unless we're only
 		 * going to add further frozen rows to it.
-		 *
-		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2505,8 +2502,22 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+
+		/*
+		 * If we're only adding already frozen rows to a previously empty
+		 * page, mark it as all-frozen and update the visibility map. We're
+		 * already holding a pin on the vmbuffer.
+		 */
 		else if (all_frozen_set)
+		{
 			PageSetAllVisible(page);
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			visibilitymap_set_vmbyte(relation,
+									 BufferGetBlockNumber(buffer),
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN);
+		}
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2554,6 +2565,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2616,7 +2633,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2626,29 +2646,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
-		 */
 		if (all_frozen_set)
-		{
-			Assert(PageIsAllVisible(page));
-			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index eb4bd3d6ae3..2485c344191 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -552,6 +552,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -572,11 +573,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -663,6 +664,42 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Now read and update the VM block. Even if we skipped updating the heap
+	 * page due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * It is only okay to set the VM bits without holding the heap page lock
+	 * because we can expect no other writers of this page.
+	 */
+	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		visibilitymap_pin(reln, blkno, &vmbuffer);
+		visibilitymap_set_vmbyte(reln, blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN);
+
+		/*
+		 * It is not possible that the VM was already set for this heap page,
+		 * so the vmbuffer must have been modified and marked dirty.
+		 */
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 8f918e00af7..0bc64203959 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page
+ *		visibilitymap_set	 - set a bit in a previously pinned page and log
+ *      visibilitymap_set_vmbyte - set a bit in a pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -318,6 +319,65 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page.
+ */
+uint8
+visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+	/* Call in same critical section where WAL is emitted. */
+	Assert(InRecovery || CritSectionCount > 0);
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..977566f6b98 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v5-0002-Make-heap_page_is_all_visible-independent-of-LVRe.patch (5.1K, 6-v5-0002-Make-heap_page_is_all_visible-independent-of-LVRe.patch)
  download | inline diff:
From a18fe6f8169af3c4e286a3dc3332ab31108998ff Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v5 02/20] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need two parameters from the
LVRelState, so just pass those in explicitly.
---
 src/backend/access/heap/vacuumlazy.c | 45 ++++++++++++++++++----------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 14036c27e87..8a62a93eee5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,8 +464,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2010,8 +2013,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2907,8 +2911,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * emitted.
 	 */
 	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
@@ -3592,9 +3596,16 @@ dead_items_cleanup(LVRelState *vacrel)
 
 /*
  * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * Return the visibility_cutoff_xid which is the highest xmin amongst the
+ * visible tuples. Sets *all_frozen to true if every tuple on this page is
+ * frozen.
  *
  * This is a stripped down version of lazy_scan_prune().  If you change
  * anything here, make sure that everything stays in sync.  Note that an
@@ -3602,9 +3613,11 @@ dead_items_cleanup(LVRelState *vacrel)
  * introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3627,7 +3640,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3651,9 +3664,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
 										 buf))
 		{
 			case HEAPTUPLE_LIVE:
@@ -3674,7 +3687,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+											   OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3709,7 +3722,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
-- 
2.43.0



  [text/x-patch] v5-0006-Combine-vacuum-phase-I-VM-update-cases.patch (5.8K, 7-v5-0006-Combine-vacuum-phase-I-VM-update-cases.patch)
  download | inline diff:
From 82343294b239425abb298358a5881f9308f7ec08 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v5 06/20] Combine vacuum phase I VM update cases

We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.

Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.

The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
 src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
 1 file changed, 32 insertions(+), 69 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 04a7b6c4181..f6cdd9e6828 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2152,11 +2152,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
-	else if (!all_visible_according_to_vm && presult.all_visible)
+
+	/*
+	 * If the page isn't yet marked all-visible in the VM or it is and needs
+	 * to me marked all-frozen, update the VM Note that all_frozen is only
+	 * valid if all_visible is true, so we must check both all_visible and
+	 * all_frozen.
+	 */
+	else if (presult.all_visible &&
+			 (!all_visible_according_to_vm ||
+			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
+		/*
+		 * If the page is all-frozen, we can pass InvalidTransactionId as our
+		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+		 * everything safe for REDO was logged when the page's tuples were
+		 * frozen.
+		 */
 		if (presult.all_frozen)
 		{
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2169,21 +2184,29 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * If the heap page is all-visible but the VM bit is not set, we don't
+		 * need to dirty the heap page.  However, if checksums are enabled, we
+		 * do need to make sure that the heap page is dirtied before passing
+		 * it to visibilitymap_set(), because it may be logged.
 		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
+		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
 									   flags);
 
 		/*
+		 * Even if we are only setting the all-frozen bit, there is a small
+		 * chance that the VM was modified sometime between setting
+		 * all_visible_according_to_vm and checking the visibility during
+		 * pruning. Check the return value of old_vmbits to ensure the
+		 * visibility map counters used for logging are accurate.
+		 *
 		 * If the page wasn't already set all-visible and/or all-frozen in the
 		 * VM, count it as newly set for logging.
 		 */
@@ -2204,66 +2227,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v5-0007-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch (11.7K, 8-v5-0007-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch)
  download | inline diff:
From 033dd160216fb21473adb94c61286b34dc0abd36 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v5 07/20] Find and fix VM corruption in
 heap_page_prune_and_freeze

Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
 src/backend/access/heap/pruneheap.c  | 87 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 77 +++---------------------
 src/include/access/heapam.h          |  4 ++
 3 files changed, 96 insertions(+), 72 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 97e51f78854..496b70e318f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
+
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+			heap_page_prune_and_freeze(relation, buffer, false,
+									   InvalidBuffer,
+									   vistest, 0,
 									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
  * (see heap_prune_satisfies_vacuum).
  *
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   bool blk_known_av,
+						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
 						   int options,
 						   struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	/*
+	 * Clear any VM corruption. This does not need to be done in a critical
+	 * section.
+	 */
+	presult->vm_corruption = false;
+	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+																blockno, buffer, page,
+																blk_known_av,
+																prstate.lpdead_items, vmbuffer);
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f6cdd9e6828..0c121fdf4e6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,12 +431,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
-										   BlockNumber heap_blk,
-										   Buffer heap_buffer, Page heap_page,
-										   bool heap_blk_known_av,
-										   int64 nlpdead_items,
-										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1939,65 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
-							   BlockNumber heap_blk,
-							   Buffer heap_buffer, Page heap_page,
-							   bool heap_blk_known_av,
-							   int64 nlpdead_items,
-							   Buffer vmbuffer)
-{
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
-		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	return false;
-}
-
 
 /* qsort comparator for sorting OffsetNumbers */
 static int
@@ -2056,11 +1991,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+	heap_page_prune_and_freeze(rel, buf,
+							   all_visible_according_to_vm,
+							   vmbuffer,
+							   vacrel->vistest, prune_options,
 							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2145,10 +2083,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables. Start by looking for any VM corruption.
+	 * all_frozen variables.
 	 */
-	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
-									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	if (presult.vm_corruption)
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
+	bool		vm_corruption;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   bool blk_known_av,
+									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
 									   int options,
 									   struct VacuumCutoffs *cutoffs,
-- 
2.43.0



  [text/x-patch] v5-0005-Combine-lazy_scan_prune-VM-corruption-cases.patch (7.1K, 9-v5-0005-Combine-lazy_scan_prune-VM-corruption-cases.patch)
  download | inline diff:
From c2d9153bfcf2a4c4d703bbfdd262dd21a6172c9d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v5 05/20] Combine lazy_scan_prune VM corruption cases

lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.

Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.

This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
 src/backend/access/heap/vacuumlazy.c | 114 +++++++++++++++++----------
 1 file changed, 73 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d9e195269d2..04a7b6c4181 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,6 +431,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1933,6 +1939,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
+
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2079,9 +2145,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * all_frozen variables. Start by looking for any VM corruption.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	{
+		/* Don't update the VM if we just cleared corruption in it */
+	}
+	else if (!all_visible_according_to_vm && presult.all_visible)
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2133,45 +2204,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-- 
2.43.0



  [text/x-patch] v5-0009-Update-VM-in-pruneheap.c.patch (12.7K, 10-v5-0009-Update-VM-in-pruneheap.c.patch)
  download | inline diff:
From 473633011ff4448cf7332de529ca235f5802c749 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v5 09/20] Update VM in pruneheap.c

As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
 src/backend/access/heap/pruneheap.c  | 99 +++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 99 +++++-----------------------
 src/include/access/heapam.h          | 15 +++--
 3 files changed, 106 insertions(+), 107 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6c3653e776c..05227ce0339 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -360,7 +360,8 @@ identify_and_fix_vm_corruption(Relation relation,
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -436,6 +437,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint;
+	uint8		vmflags = 0;
+	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -936,7 +939,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 *
 	 * Now that freezing has been finalized, unset all_visible if there are
 	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
+	 * of the page, as expected for updating the visibility map.
 	 */
 	if (prstate.all_visible && prstate.lpdead_items == 0)
 	{
@@ -952,31 +955,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->hastup = prstate.hastup;
 
 	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so the VM update record doesn't need it.
 	 */
 	if (presult->all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
 		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
 
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
 	/*
-	 * Clear any VM corruption. This does not need to be done in a critical
-	 * section.
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
 	 */
-	presult->vm_corruption = false;
 	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
-		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
-																blockno, buffer, page,
-																blk_known_av,
-																prstate.lpdead_items, vmbuffer);
+	{
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av,
+										   prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/*
+		 * If the page isn't yet marked all-visible in the VM or it is and
+		 * needs to me marked all-frozen, update the VM Note that all_frozen
+		 * is only valid if all_visible is true, so we must check both
+		 * all_visible and all_frozen.
+		 */
+		else if (presult->all_visible &&
+				 (!blk_known_av ||
+				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			Assert(prstate.lpdead_items == 0);
+			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			/*
+			 * If the page is all-frozen, we can pass InvalidTransactionId as
+			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+			 * make everything safe for REDO was logged when the page's tuples
+			 * were frozen.
+			 */
+			if (presult->all_frozen)
+			{
+				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			}
+
+			/*
+			 * It's possible for the VM bit to be clear and the page-level bit
+			 * to be set if checksums are not enabled.
+			 *
+			 * And even if we are just planning to update the frozen bit in
+			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
+			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+			 * might have become stale.
+			 *
+			 * If the heap page is all-visible but the VM bit is not set, we
+			 * don't need to dirty the heap page.  However, if checksums are
+			 * enabled, we do need to make sure that the heap page is dirtied
+			 * before passing it to visibilitymap_set(), because it may be
+			 * logged.
+			 */
+			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+			{
+				PageSetAllVisible(page);
+				MarkBufferDirty(buffer);
+			}
+
+			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+										   vmbuffer, presult->vm_conflict_horizon,
+										   vmflags);
+		}
+	}
+
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
+
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = vmflags;
+
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0c121fdf4e6..c49e81bc5dd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1933,7 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -1949,7 +1948,8 @@ cmpOffsetNumbers(const void *a, const void *b)
  * vmbuffer is the buffer containing the VM block with visibility information
  * for the heap block, blkno. all_visible_according_to_vm is the saved
  * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1978,6 +1978,7 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
+	 * Then, if the page's visibility status has changed, update the VM.
 	 *
 	 * If the relation has no indexes, we can immediately mark would-be dead
 	 * items LP_UNUSED.
@@ -1986,10 +1987,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * presult.ndeleted.  It should not be confused with presult.lpdead_items;
 	 * presult.lpdead_items's final value can be thought of as the number of
 	 * tuples that were deleted from indexes.
-	 *
-	 * We will update the VM after collecting LP_DEAD items and freezing
-	 * tuples. Pruning will have determined whether or not the page is
-	 * all-visible.
 	 */
 	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
@@ -2081,88 +2078,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (presult.vm_corruption)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		/* Don't update the VM if we just cleared corruption in it */
-	}
-
-	/*
-	 * If the page isn't yet marked all-visible in the VM or it is and needs
-	 * to me marked all-frozen, update the VM Note that all_frozen is only
-	 * valid if all_visible is true, so we must check both all_visible and
-	 * all_frozen.
-	 */
-	else if (presult.all_visible &&
-			 (!all_visible_according_to_vm ||
-			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		/*
-		 * If the page is all-frozen, we can pass InvalidTransactionId as our
-		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
-		 * everything safe for REDO was logged when the page's tuples were
-		 * frozen.
-		 */
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * If the heap page is all-visible but the VM bit is not set, we don't
-		 * need to dirty the heap page.  However, if checksums are enabled, we
-		 * do need to make sure that the heap page is dirtied before passing
-		 * it to visibilitymap_set(), because it may be logged.
-		 */
-		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * Even if we are only setting the all-frozen bit, there is a small
-		 * chance that the VM was modified sometime between setting
-		 * all_visible_according_to_vm and checking the visibility during
-		 * pruning. Check the return value of old_vmbits to ensure the
-		 * visibility map counters used for logging are accurate.
-		 *
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
+	}
 
 	return presult.ndeleted;
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
+	 * all_visible and all_frozen indicate the status of the page as reflected
+	 * in the visibility map after pruning, freezing, and setting any pages
+	 * all-visible in the visibility map.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * vm_conflict_horizon is the newest xmin of live tuples on the page
+	 * (older than OldestXmin).  It will only be valid if we did not set the
+	 * page all-frozen in the VM.
 	 *
 	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
-	bool		vm_corruption;
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
-- 
2.43.0



  [text/x-patch] v5-0008-Keep-all_frozen-updated-too-in-heap_page_prune_an.patch (3.0K, 11-v5-0008-Keep-all_frozen-updated-too-in-heap_page_prune_an.patch)
  download | inline diff:
From dfe004443fabc70f586e0073b4b6f07d687e185b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v5 08/20] Keep all_frozen updated too in
 heap_page_prune_and_freeze

We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
---
 src/backend/access/heap/pruneheap.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 496b70e318f..6c3653e776c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
 	 * whether to freeze the page or not.  The all_visible and all_frozen
 	 * values returned to the caller are adjusted to include LP_DEAD items at
 	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -824,6 +820,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -1468,7 +1465,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1490,7 +1487,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1503,7 +1500,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1522,7 +1519,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1540,7 +1537,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
-- 
2.43.0



  [text/x-patch] v5-0010-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch (28.5K, 12-v5-0010-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch)
  download | inline diff:
From 1960da345a3fba00d668a23204684a75f08b0d05 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v5 10/20] Eliminate xl_heap_visible from vacuum phase I
 prune/freeze

Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.

This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
 src/backend/access/heap/pruneheap.c  | 454 ++++++++++++++++-----------
 src/backend/access/heap/vacuumlazy.c |  30 --
 src/include/access/heapam.h          |  15 +-
 3 files changed, 278 insertions(+), 221 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 05227ce0339..cf9e5215d6b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		freeze;
+
+	/*
+	 * Whether or not to consider updating the VM. There is some bookkeeping
+	 * that must be maintained if we would like to update the VM.
+	 */
+	bool		consider_update_vm;
+
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
 	 *
 	 * These fields are not used by pruning itself for the most part, but are
 	 * used to collect information about what was pruned and what state the
-	 * page is in after pruning, for the benefit of the caller.  They are
-	 * copied to the caller's PruneFreezeResult at the end.
+	 * page is in after pruning to use when updating the visibility map and
+	 * for the benefit of the caller.  They are copied to the caller's
+	 * PruneFreezeResult at the end.
 	 * -------------------------------------------------------
 	 */
 
@@ -138,11 +146,10 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+	 * directly before updating the VM. We ignore LP_DEAD items when deciding
+	 * whether or not to opportunistically freeze and when determining the
+	 * snapshot conflict horizon required when freezing tuples.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -371,12 +378,15 @@ identify_and_fix_vm_corruption(Relation relation,
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
  *
  * blk_known_av is the visibility status of the heap block as of the last call
  * to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -392,6 +402,8 @@ identify_and_fix_vm_corruption(Relation relation,
  *   FREEZE indicates that we will also freeze tuples, and will return
  *   'all_visible', 'all_frozen' flags to the caller.
  *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -436,18 +448,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	HeapTupleData tup;
 	bool		do_freeze;
 	bool		do_prune;
-	bool		do_hint;
+	bool		do_hint_full_or_prunable;
+	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	bool		all_frozen_except_lp_dead = false;
+	bool		set_pd_all_visible = false;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
+	Assert(!prstate.consider_update_vm || vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -492,50 +510,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Keep track of whether or not the page will be all-visible and
+	 * all-frozen for use in opportunistic freezing and to update the VM if
+	 * the caller requests it.
+	 *
+	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
+	 * other callers could do either one. The visibility bookkeeping is
+	 * required for opportunistic freezing (in addition to setting the VM
+	 * bits) because we only consider opportunistically freezing tuples if the
+	 * whole page would become all-frozen or if the whole page will be frozen
+	 * except for dead tuples that will be removed by vacuum.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * If only updating the VM, we must initialize all_frozen to false, as
+	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
+	 * page and we will not end up correctly setting it to false later.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * Dead tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing, so we do not clear
+	 * all_visible when we see LP_DEAD items. We fix that after determining
+	 * whether or not to freeze but before deciding whether or not to update
+	 * the VM so that we don't set the VM bit incorrectly.
+	 *
+	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * bookkeeping. Initializing all_visible to false allows skipping the work
+	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
 	if (prstate.freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if (prstate.consider_update_vm)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon.
+	 * This is most likely to happen when updating the VM and/or freezing all
+	 * live tuples on the page. It is updated before returning to the caller
+	 * because vacuum does assert-build only validation on the page using this
+	 * field.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -733,10 +758,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
+	 * pd_prune_xid field or the page was marked full, we will update those
+	 * hint bits.
 	 */
-	do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_full_or_prunable =
+		((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
 	/*
@@ -784,7 +810,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 						if (XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
 					}
-					else if (do_hint)
+					else if (do_hint_full_or_prunable)
 					{
 						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
@@ -823,11 +849,84 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	all_frozen_except_lp_dead = prstate.all_frozen;
+	if (prstate.lpdead_items > 0)
+	{
+		prstate.all_visible = false;
+		prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
+	 */
+	if (prstate.consider_update_vm)
+	{
+		/*
+		 * Clear any VM corruption. This does not need to be in a critical
+		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+		 * we may mark the heap page buffer dirty here and could end up doing
+		 * so again later. This is not a correctness issue and is in the path
+		 * of VM corruption, so we don't have to worry about the extra
+		 * performance overhead.
+		 */
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av, prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/* Determine if we actually need to set the VM and which bits to set. */
+		else if (prstate.all_visible &&
+				 (!blk_known_av ||
+				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+			if (prstate.all_frozen)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+	}
+
+	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+	/* Save these for the caller in case we later zero out vmflags */
+	presult->new_vmbits = vmflags;
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
-	if (do_hint)
+	if (do_hint_full_or_prunable)
 	{
 		/*
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
@@ -843,15 +942,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageClearFull(page);
 
 		/*
-		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+		 * this is a non-WAL-logged hint.  If we are going to freeze or prune
+		 * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+		 * dirty and emit WAL below.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_prune && !do_freeze && !set_pd_all_visible)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -865,12 +965,47 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (set_pd_all_visible)
+			PageSetAllVisible(page);
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+		 * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+		 * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+		 * appropriate here.
 		 */
-		if (RelationNeedsWAL(relation))
+		if (do_prune || do_freeze || set_pd_all_visible)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
+												  vmbuffer, vmflags);
+
+			if (old_vmbits == vmflags)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				do_set_vm = false;
+				/* 0 out vmflags so we don't emit WAL to update the VM */
+				vmflags = 0;
+			}
+		}
+
+		/*
+		 * It should never be the case that PD_ALL_VISIBLE is not set and the
+		 * VM is set. Or, if it were, we should have caught it earlier when
+		 * finding and fixing VM corruption. So, if we found out the VM was
+		 * already set above, we should have found PD_ALL_VISIBLE set earlier.
+		 */
+		Assert(!set_pd_all_visible || do_set_vm);
+
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+		 * we were only updating the VM and it turns out it was already set,
+		 * we will have unset do_set_vm earlier. As such, check it again
+		 * before emitting the record.
+		 */
+		if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -882,35 +1017,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
-			TransactionId conflict_xid;
+			TransactionId conflict_xid = InvalidTransactionId;
 
 			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
+			 * If we are updating the VM, the conflict horizon is almost
+			 * always the visibility cutoff XID.
+			 *
+			 * Separately, if we are freezing any tuples, as an optimization,
+			 * we can use the visibility_cutoff_xid as the conflict horizon if
+			 * the page will be all-frozen. This is true even if there are
+			 * LP_DEAD line pointers because we ignored those when maintaining
+			 * the visibility_cutoff_xid.
 			 */
-			if (do_freeze)
+			if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+				conflict_xid = prstate.visibility_cutoff_xid;
+
+			/*
+			 * Otherwise, if we are freezing but the page would not be
+			 * all-frozen, we have to use the more pessimistic horizon of
+			 * OldestXmin, which may be newer than the newest tuple we froze.
+			 * We currently don't track the newest tuple we froze.
+			 */
+			else if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
+				conflict_xid = prstate.cutoffs->OldestXmin;
+				TransactionIdRetreat(conflict_xid);
 			}
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
+			/*
+			 * If we are removing tuples with a younger xmax than our so far
+			 * calculated conflict_xid, we must use this as our horizon.
+			 */
+			if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
 				conflict_xid = prstate.latest_xid_removed;
 
+			/*
+			 * We can omit the snapshot conflict horizon if we are not pruning
+			 * or freezing any tuples and are setting an already all-visible
+			 * page all-frozen in the VM. In this case, all of the tuples on
+			 * the page must already be visible to all MVCC snapshots on the
+			 * standby.
+			 */
+			if (!do_prune && !do_freeze && do_set_vm &&
+				blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+				conflict_xid = InvalidTransactionId;
+
 			log_heap_prune_and_freeze(relation, buffer,
 									  false,
-									  InvalidBuffer, 0, false,
+									  vmbuffer,
+									  vmflags,
+									  set_pd_all_visible,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -922,124 +1078,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected for updating the visibility map.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
-	presult->hastup = prstate.hastup;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so the VM update record doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * VACUUM will call heap_page_is_all_visible() during the second pass over
+	 * the heap to determine all_visible and all_frozen for the page -- this
+	 * is a specialized version of the logic from this function.  Now that
+	 * we've finished pruning and freezing, make sure that we're in total
+	 * agreement with heap_page_is_all_visible() using an assertion. We will
+	 * have already set the page in the VM, so this assertion will only let
+	 * you know that you've already done something wrong.
 	 */
-	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
 	{
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av,
-										   prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		/*
-		 * If the page isn't yet marked all-visible in the VM or it is and
-		 * needs to me marked all-frozen, update the VM Note that all_frozen
-		 * is only valid if all_visible is true, so we must check both
-		 * all_visible and all_frozen.
-		 */
-		else if (presult->all_visible &&
-				 (!blk_known_av ||
-				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			Assert(prstate.lpdead_items == 0);
-			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		Assert(cutoffs);
 
-			/*
-			 * If the page is all-frozen, we can pass InvalidTransactionId as
-			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
-			 * make everything safe for REDO was logged when the page's tuples
-			 * were frozen.
-			 */
-			if (presult->all_frozen)
-			{
-				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			}
+		Assert(prstate.lpdead_items == 0);
 
-			/*
-			 * It's possible for the VM bit to be clear and the page-level bit
-			 * to be set if checksums are not enabled.
-			 *
-			 * And even if we are just planning to update the frozen bit in
-			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
-			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
-			 * might have become stale.
-			 *
-			 * If the heap page is all-visible but the VM bit is not set, we
-			 * don't need to dirty the heap page.  However, if checksums are
-			 * enabled, we do need to make sure that the heap page is dirtied
-			 * before passing it to visibilitymap_set(), because it may be
-			 * logged.
-			 */
-			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buffer);
-			}
+		if (!heap_page_is_all_visible(relation, buffer,
+									  cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
 
-			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
-										   vmbuffer, presult->vm_conflict_horizon,
-										   vmflags);
-		}
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
+#endif
 
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->old_vmbits = old_vmbits;
+	/* new_vmbits was set above */
+	presult->hastup = prstate.hastup;
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = vmflags;
-
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
@@ -1621,7 +1708,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			break;
 	}
 
-	/* Consider freezing any normal tuples which will not be removed */
+	/*
+	 * Consider freezing any normal tuples which will not be removed.
+	 * Regardless of whether or not we want to freeze the tuples, if we want
+	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+	 * tuple to know whether or not the page will be totally frozen.
+	 */
 	if (prstate->freeze)
 	{
 		bool		totally_frozen;
@@ -2234,6 +2326,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
 	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
+	Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
 	xlrec.flags = 0;
 	regbuf_flags = REGBUF_STANDARD;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c49e81bc5dd..91e209901b8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2014,34 +2014,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2075,8 +2047,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate the status of the page as reflected
-	 * in the visibility map after pruning, freezing, and setting any pages
-	 * all-visible in the visibility map.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page
-	 * (older than OldestXmin).  It will only be valid if we did not set the
-	 * page all-frozen in the VM.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
 	uint8		old_vmbits;
 	uint8		new_vmbits;
 
-- 
2.43.0



  [text/x-patch] v5-0011-Rename-PruneState.freeze-to-attempt_freeze.patch (4.1K, 13-v5-0011-Rename-PruneState.freeze-to-attempt_freeze.patch)
  download | inline diff:
From 92232d63451af2ffb3eeaa2dfe9c6e83ce7ba938 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v5 11/20] Rename PruneState.freeze to attempt_freeze

This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.

Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
 src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index cf9e5215d6b..82127e8728b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
 	/* whether or not dead items can be set LP_UNUSED during pruning */
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
-	bool		freeze;
+	bool		attempt_freeze;
 
 	/*
 	 * Whether or not to consider updating the VM. There is some bookkeeping
@@ -452,7 +452,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
-	bool		hint_bit_fpi;
+	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	bool		all_frozen_except_lp_dead = false;
 	bool		set_pd_all_visible = false;
@@ -460,7 +460,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
 	prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
@@ -485,7 +485,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/* initialize page freezing working state */
 	prstate.pagefrz.freeze_required = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
 		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -535,7 +535,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * bookkeeping. Initializing all_visible to false allows skipping the work
 	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -653,7 +653,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
 	 * an FPI to be emitted.
 	 */
-	hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	/*
 	 * Process HOT chains.
@@ -770,7 +770,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * plans we prepared, or not.
 	 */
 	do_freeze = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (prstate.pagefrz.freeze_required)
 		{
@@ -803,7 +803,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				 */
 				if (RelationNeedsWAL(relation))
 				{
-					if (hint_bit_fpi)
+					if (did_tuple_hint_fpi)
 						do_freeze = true;
 					else if (do_prune)
 					{
@@ -1127,7 +1127,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
 		{
@@ -1714,7 +1714,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
 	 * tuple to know whether or not the page will be totally frozen.
 	 */
-	if (prstate->freeze)
+	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
 
-- 
2.43.0



  [text/x-patch] v5-0014-Use-GlobalVisState-to-determine-page-level-visibi.patch (10.5K, 14-v5-0014-Use-GlobalVisState-to-determine-page-level-visibi.patch)
  download | inline diff:
From 67597b88b4127d767db8ca32d1e29cd4ec79a070 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v5 14/20] Use GlobalVisState to determine page level
 visibility

During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.

It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.

Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.

Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
 src/backend/access/heap/pruneheap.c         | 48 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 17 ++++----
 src/include/access/heapam.h                 |  4 +-
 4 files changed, 59 insertions(+), 38 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 715dfc16ba7..ab79d8a3ed9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon, when setting the VM or when
+	 * freezing all the live tuples on the page.
 	 *
 	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
 	 * directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -553,14 +552,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon.
-	 * This is most likely to happen when updating the VM and/or freezing all
-	 * live tuples on the page. It is updated before returning to the caller
-	 * because vacuum does assert-build only validation on the page using this
-	 * field.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState.
+	 *
+	 * If we encounter an uncommitted tuple, this field is unmaintained. If
+	 * the page is being set all-visible or when freezing all live tuples on
+	 * the page, it is used to calculate the snapshot conflict horizon.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -756,6 +753,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -1098,12 +1105,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(cutoffs);
-
 		Assert(prstate.lpdead_items == 0);
 
 		if (!heap_page_is_all_visible(relation, buffer,
-									  cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1628,19 +1633,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6a0fa371a06..777ec30eb82 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,7 +465,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
-												   TransactionId OldestXmin,
+												   GlobalVisState *vistest,
 												   OffsetNumber *deadoffsets,
 												   int allowed_num_offsets,
 												   bool *all_frozen,
@@ -2716,7 +2716,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 InvalidOffsetNumber);
 
 	if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
-											   vacrel->cutoffs.OldestXmin,
+											   vacrel->vistest,
 											   deadoffsets, num_offsets,
 											   &all_frozen, &visibility_cutoff_xid,
 											   &vacrel->offnum))
@@ -3459,13 +3459,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+	return heap_page_is_all_visible_except_lpdead(rel, buf, vistest,
 												  NULL, 0,
 												  all_frozen,
 												  visibility_cutoff_xid,
@@ -3500,7 +3500,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
-									   TransactionId OldestXmin,
+									   GlobalVisState *vistest,
 									   OffsetNumber *deadoffsets,
 									   int allowed_num_offsets,
 									   bool *all_frozen,
@@ -3555,8 +3555,8 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
 		tuple.t_len = ItemIdGetLength(itemid);
 		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
-										 buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+												  buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3575,8 +3575,7 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin,
-											   OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
 
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v5-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patch (7.1K, 15-v5-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patch)
  download | inline diff:
From 2b99c5954eaa99bad8efebc1eb0289b42469eee2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v5 13/20] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()

Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 14 +++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 13 ++++++-------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ffc12314b41..715dfc16ba7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -574,9 +574,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1172,11 +1172,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2678f7ab782..4b8e5747239 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index bf987aed8d3..508bb379f87 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisible(state, fxid);
 }
 
 /*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisible(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index d346be71642..aec0692b5db 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -97,8 +97,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v5-0012-Remove-xl_heap_visible-entirely.patch (24.3K, 16-v5-0012-Remove-xl_heap_visible-entirely.patch)
  download | inline diff:
From 632ace2402679e28a3af367d16434523135402a0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v5 12/20] Remove xl_heap_visible entirely

There are now no users of this, so eliminate it entirely.
---
 src/backend/access/common/bufmask.c      |   3 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 154 +----------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  10 +-
 src/backend/access/heap/visibilitymap.c  | 106 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  11 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 30 insertions(+), 365 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 68db4325285..48f7b84156a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
 #include "access/valid.h"
 #include "access/visibilitymap.h"
 #include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
@@ -2512,11 +2513,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			visibilitymap_set_vmbyte(relation,
-									 BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set(relation,
+							  BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN);
 		}
 
 		/*
@@ -8784,49 +8785,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
 
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 14541e2e94f..64f06d46bf1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -82,10 +82,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		memcpy(&vmflags, maindataptr, sizeof(uint8));
 		maindataptr += sizeof(uint8);
 
-		/*
-		 * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
-		 * because we already have XLHP_IS_CATALOG_REL.
-		 */
 		Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
 		/* Must never set all_frozen bit without also setting all_visible bit */
 		Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
@@ -267,7 +263,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+		old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
 			PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -278,143 +274,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		UnlockReleaseBuffer(vmbuffer);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-		visibilitymap_pin(reln, blkno, &vmbuffer);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -791,16 +650,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		visibilitymap_set_vmbyte(reln, blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN);
 
 		/*
 		 * It is not possible that the VM was already set for this heap page,
 		 * so the vmbuffer must have been modified and marked dirty.
 		 */
 		Assert(BufferIsDirty(vmbuffer));
+		visibilitymap_set(reln, blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
 		PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		FreeFakeRelcacheEntry(reln);
 	}
@@ -1380,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 82127e8728b..ffc12314b41 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -979,8 +979,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_set_vm)
 		{
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
-												  vmbuffer, vmflags);
+			old_vmbits = visibilitymap_set(relation, blockno,
+										   vmbuffer, vmflags);
 
 			if (old_vmbits == vmflags)
 			{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 91e209901b8..6a0fa371a06 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1887,8 +1887,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 			MarkBufferDirty(buf);
 
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			visibilitymap_set_vmbyte(vacrel->rel, blkno,
-									 vmbuffer, new_vmbits);
+			visibilitymap_set(vacrel->rel, blkno,
+							  vmbuffer, new_vmbits);
 
 			if (RelationNeedsWAL(vacrel->rel))
 			{
@@ -2754,9 +2754,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		set_pd_all_vis = true;
 		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbyte(vacrel->rel,
-								 blkno,
-								 vmbuffer, vmflags);
+		visibilitymap_set(vacrel->rel,
+						  blkno,
+						  vmbuffer, vmflags);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 0bc64203959..5ed54e06dd4 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page and log
- *      visibilitymap_set_vmbyte - set a bit in a pinned page
+ *		visibilitymap_set	 - set a bit in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,105 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set flags in the VM block contained in the passed in vmBuf.
  *
@@ -338,8 +238,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * making any changes needed to the associated heap page.
  */
 uint8
-visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index d6c86ccac20..f7880a4ed81 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -351,13 +351,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -462,9 +455,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ceae9c083ff..a64677b7bca 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -438,20 +437,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -495,11 +480,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 977566f6b98..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..98b1adc4e9e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4274,7 +4274,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v5-0015-Inline-TransactionIdFollows-Precedes.patch (4.9K, 17-v5-0015-Inline-TransactionIdFollows-Precedes.patch)
  download | inline diff:
From 5ca49d81544be2dd5502d5509fe09325df9d0857 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v5 15/20] Inline TransactionIdFollows/Precedes()

Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
---
 src/backend/access/transam/transam.c | 64 -------------------------
 src/include/access/transam.h         | 70 ++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
-	/*
-	 * If either ID is a permanent XID then we can just do unsigned
-	 * comparison.  If both are normal, do a modulo-2^32 comparison.
-	 */
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 < id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 <= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 > id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 >= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff >= 0);
-}
-
 
 /*
  * TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
 } TransamVariablesData;
 
 
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+	/*
+	 * If either ID is a permanent XID then we can just do unsigned
+	 * comparison.  If both are normal, do a modulo-2^32 comparison.
+	 */
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 < id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 <= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 > id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 >= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff >= 0);
+}
+
+
 /* ----------------
  *		extern declarations
  * ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
 extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
-- 
2.43.0



  [text/x-patch] v5-0016-Unset-all-visible-sooner-if-not-freezing.patch (2.5K, 18-v5-0016-Unset-all-visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 474ed6ba17773f557cd9fbf196388ffb6a7b7c4e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v5 16/20] Unset all-visible sooner if not freezing

In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.

Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.

Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
 src/backend/access/heap/pruneheap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ab79d8a3ed9..80d055e5376 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1493,8 +1493,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1752,8 +1755,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v5-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch (25.8K, 19-v5-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 262dd663f1cb7fbbf84865e5bccf890c15762412 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v5 17/20] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c          | 15 +++++-
 src/backend/access/heap/heapam_handler.c  | 15 +++++-
 src/backend/access/heap/pruneheap.c       | 63 ++++++++++++++++++-----
 src/backend/access/index/indexam.c        | 46 +++++++++++++++++
 src/backend/access/table/tableam.c        | 39 ++++++++++++--
 src/backend/executor/execMain.c           |  4 ++
 src/backend/executor/execUtils.c          |  2 +
 src/backend/executor/nodeBitmapHeapscan.c |  7 ++-
 src/backend/executor/nodeIndexscan.c      | 18 ++++---
 src/backend/executor/nodeSeqscan.c        | 24 +++++++--
 src/include/access/genam.h                | 11 ++++
 src/include/access/heapam.h               | 24 +++++++--
 src/include/access/relscan.h              |  6 +++
 src/include/access/tableam.h              | 30 ++++++++++-
 src/include/nodes/execnodes.h             |  6 +++
 15 files changed, 273 insertions(+), 37 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 48f7b84156a..f90b014a9b0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1236,6 +1239,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1274,6 +1278,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1306,6 +1316,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cb4bc35c93e..c68283de6f2 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 80d055e5376..dad341cb265 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			int			options = 0;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				options = HEAP_PAGE_PRUNE_UPDATE_VM;
+			}
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * that during on-access pruning with the current implementation.
 			 */
 			heap_page_prune_and_freeze(relation, buffer, false,
-									   InvalidBuffer,
-									   vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+									   vmbuffer ? *vmbuffer : InvalidBuffer,
+									   vistest, options,
+									   NULL, &presult, PRUNE_ON_ACCESS,
+									   &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -513,12 +525,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * all-frozen for use in opportunistic freezing and to update the VM if
 	 * the caller requests it.
 	 *
-	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
-	 * other callers could do either one. The visibility bookkeeping is
-	 * required for opportunistic freezing (in addition to setting the VM
-	 * bits) because we only consider opportunistically freezing tuples if the
-	 * whole page would become all-frozen or if the whole page will be frozen
-	 * except for dead tuples that will be removed by vacuum.
+	 * Currently, only VACUUM attempts freezing. But other callers could. The
+	 * visibility bookkeeping is required for opportunistic freezing (in
+	 * addition to setting the VM bits) because we only consider
+	 * opportunistically freezing tuples if the whole page would become
+	 * all-frozen or if the whole page will be frozen except for dead tuples
+	 * that will be removed by vacuum. But if consider_update_vm is false,
+	 * we'll not set the VM even if the page is discovered to be all-visible.
+	 *
+	 * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+	 * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+	 * because we will not call heap_prepare_freeze_tuple() on each tuple.
 	 *
 	 * If only updating the VM, we must initialize all_frozen to false, as
 	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -530,7 +547,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * whether or not to freeze but before deciding whether or not to update
 	 * the VM so that we don't set the VM bit incorrectly.
 	 *
-	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * If not freezing and not updating the VM, we avoid the extra
 	 * bookkeeping. Initializing all_visible to false allows skipping the work
 	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
@@ -879,12 +896,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.all_frozen = false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate.consider_update_vm &&
+		prstate.all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate.consider_update_vm = false;
+		prstate.all_visible = prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * Handle setting visibility map bit based on information from the VM (if
+	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+	 * call), and from all_visible and all_frozen variables.
 	 */
 	if (prstate.consider_update_vm)
 	{
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 219df1971da..d803c307517 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -279,6 +279,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -610,6 +636,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0391798dd2c..065676eb7cf 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -917,6 +917,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index ed35c58c2c3..15e1853027b 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -362,6 +367,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -369,8 +375,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -400,8 +409,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..0e986d8ef72 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -876,6 +878,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -913,10 +934,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, struct ScanKeyData *key)
+				   int nkeys, struct ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1125,6 +1149,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e107d6e5f81..326d7d78860 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -680,6 +680,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v5-0018-Add-helper-functions-to-heap_page_prune_and_freez.patch (18.9K, 20-v5-0018-Add-helper-functions-to-heap_page_prune_and_freez.patch)
  download | inline diff:
From d239dd8a66eee4e0b0dac4dc1e068b71ba219ac7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v5 18/20] Add helper functions to heap_page_prune_and_freeze

heap_page_prune_and_freeze() has gotten rather long. It has several
stages:

1) setup - where teh PruneStateis set up
2) tuple examination -- where tuples and line pointers are examined to
   determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
   heuristics, and state gathered during stage 2 whether or not to
   freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged

This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.

For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
 src/backend/access/heap/pruneheap.c | 471 +++++++++++++++++-----------
 1 file changed, 295 insertions(+), 176 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dad341cb265..5d943b0c64f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+								  bool do_prune,
+								  bool do_hint_full_or_prunable,
+								  bool did_tuple_hint_fpi,
+								  PruneState *prstate,
+								  bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+									 Buffer buffer, BlockNumber blockno, Page page,
+									 PruneReason reason,
+									 bool do_prune, bool do_freeze,
+									 bool blk_known_av,
+									 PruneState *prstate,
+									 Buffer *vmbuffer, uint8 *vmflags,
+									 bool *set_pd_all_visible);
+
 static bool identify_and_fix_vm_corruption(Relation relation,
 										   BlockNumber heap_blk,
 										   Buffer heap_buffer, Page heap_page,
@@ -376,6 +392,249 @@ identify_and_fix_vm_corruption(Relation relation,
 	return false;
 }
 
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+						 Buffer buffer, BlockNumber blockno, Page page,
+						 PruneReason reason,
+						 bool do_prune, bool do_freeze,
+						 bool blk_known_av,
+						 PruneState *prstate,
+						 Buffer *vmbuffer, uint8 *vmflags,
+						 bool *set_pd_all_visible)
+{
+	bool		do_set_vm = false;
+
+	/*
+	 * If the caller specified not to update the VM, validate everything is in
+	 * the right state and exit.
+	 */
+	if (!prstate->consider_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		/* We don't set only the page level visibility hint */
+		Assert(!(*set_pd_all_visible));
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->consider_update_vm &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate->consider_update_vm = false;
+		prstate->all_visible = prstate->all_frozen = false;
+	}
+
+	Assert(!prstate->all_frozen || prstate->all_visible);
+
+	/*
+	 * Clear any VM corruption. This does not need to be in a critical
+	 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+	 * may mark the heap page buffer dirty here and could end up doing so
+	 * again later. This is not a correctness issue and is in the path of VM
+	 * corruption, so we don't have to worry about the extra performance
+	 * overhead.
+	 */
+	if (identify_and_fix_vm_corruption(relation,
+									   blockno, buffer, page,
+									   blk_known_av, prstate->lpdead_items,
+									   *vmbuffer))
+	{
+		/* If we fix corruption, don't update the VM further */
+	}
+
+	/* Determine if we actually need to set the VM and which bits to set. */
+	else if (prstate->all_visible &&
+			 (!blk_known_av ||
+			  (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+	{
+		*vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+	}
+
+	do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	*set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+	return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool do_prune,
+					  bool do_hint_full_or_prunable,
+					  bool did_tuple_hint_fpi,
+					  PruneState *prstate,
+					  bool *all_frozen_except_lp_dead)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and exit.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		Assert(!(*all_frozen_except_lp_dead));
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_full_or_prunable)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	*all_frozen_except_lp_dead = prstate->all_frozen;
+	if (prstate->lpdead_items > 0)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
+
+	return do_freeze;
+}
+
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page. If the page's visibility status has changed, update it in
@@ -766,20 +1025,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Clear the offset information once we have processed the given page. */
 	*off_loc = InvalidOffsetNumber;
 
-	do_prune = prstate.nredirected > 0 ||
-		prstate.ndead > 0 ||
-		prstate.nunused > 0;
-
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
-	 * all-visible.
+	 * all-visible. This must be done before we decide whether or not to
+	 * opportunistically freeze below because we do not want to
+	 * opportunistically freeze the page if there are live tuples not visible
+	 * to everyone, which would prevent setting the page frozen in the VM.
 	 */
 	if (prstate.all_visible &&
 		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
+	/*
+	 * Now decide based on information collected while examining every tuple
+	 * which actions to take. If there are any prunable tuples, we'll prune
+	 * them. However, we will decide based on options specified by the caller
+	 * and various heuristics whether or not to freeze any tuples and whether
+	 * or not the page should be set all-visible/all-frozen in the VM.
+	 */
+	do_prune = prstate.nredirected > 0 ||
+		prstate.ndead > 0 ||
+		prstate.nunused > 0;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -790,182 +1059,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageIsFull(page);
 
 	/*
-	 * Decide if we want to go ahead with freezing according to the freeze
-	 * plans we prepared, or not.
-	 */
-	do_freeze = false;
-	if (prstate.attempt_freeze)
-	{
-		if (prstate.pagefrz.freeze_required)
-		{
-			/*
-			 * heap_prepare_freeze_tuple indicated that at least one XID/MXID
-			 * from before FreezeLimit/MultiXactCutoff is present.  Must
-			 * freeze to advance relfrozenxid/relminmxid.
-			 */
-			do_freeze = true;
-		}
-		else
-		{
-			/*
-			 * Opportunistically freeze the page if we are generating an FPI
-			 * anyway and if doing so means that we can set the page
-			 * all-frozen afterwards (might not happen until VACUUM's final
-			 * heap pass).
-			 *
-			 * XXX: Previously, we knew if pruning emitted an FPI by checking
-			 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze
-			 * and prune records were combined, this heuristic couldn't be
-			 * used anymore.  The opportunistic freeze heuristic must be
-			 * improved; however, for now, try to approximate the old logic.
-			 */
-			if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
-			{
-				/*
-				 * Freezing would make the page all-frozen.  Have already
-				 * emitted an FPI or will do so anyway?
-				 */
-				if (RelationNeedsWAL(relation))
-				{
-					if (did_tuple_hint_fpi)
-						do_freeze = true;
-					else if (do_prune)
-					{
-						if (XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-					else if (do_hint_full_or_prunable)
-					{
-						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
-	}
-	else if (prstate.nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate.pagefrz.freeze_required);
-
-		prstate.all_frozen = false;
-		prstate.nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page. It needs to reflect the present state of
-	 * the page when using it to determine whether or not to update the VM.
-	 *
-	 * Keep track of whether or not the page was all-frozen except LP_DEAD
-	 * items for the purposes of calculating the snapshot conflict horizon,
-	 * though.
+	 * We must decide whether or not to freeze before deciding if and what to
+	 * set in the VM.
 	 */
-	all_frozen_except_lp_dead = prstate.all_frozen;
-	if (prstate.lpdead_items > 0)
-	{
-		prstate.all_visible = false;
-		prstate.all_frozen = false;
-	}
-
-	/*
-	 * If this is an on-access call and we're not actually pruning, avoid
-	 * setting the visibility map if it would newly dirty the heap page or, if
-	 * the page is already dirty, if doing so would require including a
-	 * full-page image (FPI) of the heap page in the WAL. This situation
-	 * should be rare, as on-access pruning is only attempted when
-	 * pd_prune_xid is valid.
-	 */
-	if (reason == PRUNE_ON_ACCESS &&
-		prstate.consider_update_vm &&
-		prstate.all_visible &&
-		!do_prune && !do_freeze &&
-		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
-	{
-		prstate.consider_update_vm = false;
-		prstate.all_visible = prstate.all_frozen = false;
-	}
-
-	Assert(!prstate.all_frozen || prstate.all_visible);
-
-	/*
-	 * Handle setting visibility map bit based on information from the VM (if
-	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
-	 * call), and from all_visible and all_frozen variables.
-	 */
-	if (prstate.consider_update_vm)
-	{
-		/*
-		 * Clear any VM corruption. This does not need to be in a critical
-		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
-		 * we may mark the heap page buffer dirty here and could end up doing
-		 * so again later. This is not a correctness issue and is in the path
-		 * of VM corruption, so we don't have to worry about the extra
-		 * performance overhead.
-		 */
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av, prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
-
-		/* Determine if we actually need to set the VM and which bits to set. */
-		else if (prstate.all_visible &&
-				 (!blk_known_av ||
-				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
-			if (prstate.all_frozen)
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-	}
-
-	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
-
-	/*
-	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
-	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
-	 * set, we strongly prefer to keep them in sync.
-	 *
-	 * Prior to Postgres 19, it was possible for the page-level bit to be set
-	 * and the VM bit to be clear. This could happen if we crashed after
-	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
-	 */
-	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+	do_freeze = heap_page_will_freeze(relation, buffer,
+									  do_prune,
+									  do_hint_full_or_prunable,
+									  did_tuple_hint_fpi,
+									  &prstate,
+									  &all_frozen_except_lp_dead);
+
+	do_set_vm = heap_page_will_update_vm(relation,
+										 buffer, blockno, page,
+										 reason,
+										 do_prune, do_freeze,
+										 blk_known_av,
+										 &prstate,
+										 &vmbuffer,
+										 &vmflags, &set_pd_all_visible);
 
 	/* Save these for the caller in case we later zero out vmflags */
 	presult->new_vmbits = vmflags;
 
-	/* Any error while applying the changes is critical */
+	/*
+	 * Time to actually make the changes to the page and log them. Any error
+	 * while applying the changes is critical.
+	 */
 	START_CRIT_SECTION();
 
 	if (do_hint_full_or_prunable)
-- 
2.43.0



  [text/x-patch] v5-0019-Reorder-heap_page_prune_and_freeze-parameters.patch (5.8K, 21-v5-0019-Reorder-heap_page_prune_and_freeze-parameters.patch)
  download | inline diff:
From 86abf2be861bcab612737be55256edd6e67cd597 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v5 19/20] Reorder heap_page_prune_and_freeze parameters

Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
 src/backend/access/heap/pruneheap.c  | 38 ++++++++++++++--------------
 src/backend/access/heap/vacuumlazy.c |  6 ++---
 src/include/access/heapam.h          |  4 +--
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5d943b0c64f..20f4a62fb16 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, false,
+			heap_page_prune_and_freeze(relation, buffer, options, false,
 									   vmbuffer ? *vmbuffer : InvalidBuffer,
-									   vistest, options,
-									   NULL, &presult, PRUNE_ON_ACCESS,
+									   vistest,
+									   NULL, PRUNE_ON_ACCESS, &presult,
 									   &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -645,6 +645,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
+ * options:
+ *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ *   pruning.
+ *
+ *   FREEZE indicates that we will also freeze tuples, and will return
+ *   'all_visible', 'all_frozen' flags to the caller.
+ *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
@@ -663,30 +672,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * contain the required block of the visibility map.
  *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- *   UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
  *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed.  It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -699,13 +699,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   int options,
 						   bool blk_known_av,
 						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
-						   int options,
 						   struct VacuumCutoffs *cutoffs,
-						   PruneFreezeResult *presult,
 						   PruneReason reason,
+						   PruneFreezeResult *presult,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 777ec30eb82..120782fd8ec 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1992,11 +1992,11 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf,
+	heap_page_prune_and_freeze(rel, buf, prune_options,
 							   all_visible_according_to_vm,
 							   vmbuffer,
-							   vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+							   vacrel->vistest,
+							   &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
 								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   int options,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
-									   int options,
 									   struct VacuumCutoffs *cutoffs,
-									   PruneFreezeResult *presult,
 									   PruneReason reason,
+									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-08-01 21:36  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 3 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-08-01 21:36 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>

On Thu, Jul 31, 2025 at 6:58 PM Melanie Plageman
<[email protected]> wrote:
>
> The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
> patch in the set. It sets pd_prune_xid on insert (so pages filled by
> COPY or insert can also be set all-visible in the VM before they are
> vacuumed). I gave it a .txt extension because it currently fails
> 035_standby_logical_decoding due to a recovery conflict. I need to
> investigate more to see if this is a bug in my patch set or elsewhere
> in Postgres.

I figured out that if we set the VM on-access, we need to enable
hot_standby_feedback in more places in 035_standby_logical_decoding.pl
to avoid recovery conflicts. I've done that in the attached updated
version 6. There are a few other issues in
035_standby_logical_decoding.pl that I reported here [1]. With these
changes, setting pd_prune_xid on insert passes tests. Whether or not
we want to do it (and what the heuristic should be for deciding when
to do it) is another question.

- Melanie

[1] https://www.postgresql.org/message-id/flat/CAAKRu_YO2mEm%3DZWZKPjTMU%3DgW5Y83_KMi_1cr51JwavH0ctd7w%4...


Attachments:

  [text/x-patch] v6-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch (10.8K, 2-v6-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch)
  download | inline diff:
From 62aaaf33ff9fcc256c42579c5dce9e9e6e6344cd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v6 01/20] Eliminate xl_heap_visible in COPY FREEZE

Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.
---
 src/backend/access/heap/heapam.c        | 47 ++++++++++---------
 src/backend/access/heap/heapam_xlog.c   | 39 +++++++++++++++-
 src/backend/access/heap/visibilitymap.c | 62 ++++++++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c  |  5 ++
 src/include/access/visibilitymap.h      |  2 +
 5 files changed, 132 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..68db4325285 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2493,9 +2493,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * If the page is all visible, need to clear that, unless we're only
 		 * going to add further frozen rows to it.
-		 *
-		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2505,8 +2502,22 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+
+		/*
+		 * If we're only adding already frozen rows to a previously empty
+		 * page, mark it as all-frozen and update the visibility map. We're
+		 * already holding a pin on the vmbuffer.
+		 */
 		else if (all_frozen_set)
+		{
 			PageSetAllVisible(page);
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			visibilitymap_set_vmbyte(relation,
+									 BufferGetBlockNumber(buffer),
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN);
+		}
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2554,6 +2565,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2616,7 +2633,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2626,29 +2646,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
-		 */
 		if (all_frozen_set)
-		{
-			Assert(PageIsAllVisible(page));
-			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index eb4bd3d6ae3..2485c344191 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -552,6 +552,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -572,11 +573,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -663,6 +664,42 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Now read and update the VM block. Even if we skipped updating the heap
+	 * page due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * It is only okay to set the VM bits without holding the heap page lock
+	 * because we can expect no other writers of this page.
+	 */
+	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		visibilitymap_pin(reln, blkno, &vmbuffer);
+		visibilitymap_set_vmbyte(reln, blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN);
+
+		/*
+		 * It is not possible that the VM was already set for this heap page,
+		 * so the vmbuffer must have been modified and marked dirty.
+		 */
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 8f918e00af7..0bc64203959 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page
+ *		visibilitymap_set	 - set a bit in a previously pinned page and log
+ *      visibilitymap_set_vmbyte - set a bit in a pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -318,6 +319,65 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page.
+ */
+uint8
+visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+	/* Call in same critical section where WAL is emitted. */
+	Assert(InRecovery || CritSectionCount > 0);
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..977566f6b98 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v6-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch (24.4K, 3-v6-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch)
  download | inline diff:
From 17427256d348f7414bfb8ceb74e00e3d8cd390a5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v6 03/20] Eliminate xl_heap_visible from vacuum phase III

Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.
---
 src/backend/access/heap/heapam_xlog.c  | 142 ++++++++++++++++++++---
 src/backend/access/heap/pruneheap.c    |  48 +++++++-
 src/backend/access/heap/vacuumlazy.c   | 149 +++++++++++++++++--------
 src/backend/access/rmgrdesc/heapdesc.c |  13 ++-
 src/include/access/heapam.h            |   9 ++
 src/include/access/heapam_xlog.h       |   3 +
 6 files changed, 296 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2485c344191..14541e2e94f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. If pruning, that
+	 * means we cannot remove tuples still visible to transactions on the
+	 * standby. If freezing, that means we cannot freeze tuples with xids that
+	 * are still considered running on the standby. And for setting the VM, we
+	 * cannot do so if the page isn't all-visible to all transactions on the
+	 * standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -70,13 +76,28 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 												rlocator);
 	}
 
+	/* Next are the optionally included vmflags. Copy them out for later use. */
+	if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+	{
+		memcpy(&vmflags, maindataptr, sizeof(uint8));
+		maindataptr += sizeof(uint8);
+
+		/*
+		 * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
+		 * because we already have XLHP_IS_CATALOG_REL.
+		 */
+		Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+		/* Must never set all_frozen bit without also setting all_visible bit */
+		Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
+	}
+
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = (Page) BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -89,6 +110,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Size		datalen;
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
+		bool		do_prune;
+		bool		mark_buffer_dirty;
+		bool		set_heap_lsn;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +121,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+		set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 ||
+			   vmflags & VISIBILITYMAP_VALID_BITS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,26 +169,72 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		Assert(BufferIsValid(buffer) &&
+			   BufferGetBlockNumber(buffer) == blkno);
+
+		/*
+		 * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+		 * also going to set bits in the VM later.
+		 *
+		 * We must never end up with the VM bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+		 * modification would fail to clear the VM bit.
+		 */
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+
+			/*
+			 * If the only change to the heap page is setting PD_ALL_VISIBLE,
+			 * we can avoid setting the page LSN unless checksums or
+			 * wal_log_hints are enabled.
+			 */
+			set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+			mark_buffer_dirty = true;
+		}
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
 
-		PageSetLSN(page, lsn);
-		MarkBufferDirty(buffer);
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+		if (set_heap_lsn)
+			PageSetLSN(page, lsn);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or will be setting a page in
+	 * the visibility map, update the free space map.
+	 *
+	 * Even if we are just updating the VM (and thus not freeing up any
+	 * space), we'll still update the FSM for this page. Since FSM is not
+	 * WAL-logged and only updated heuristically, it easily becomes stale in
+	 * standbys.  If the standby is later promoted and runs VACUUM, it will
+	 * skip updating individual free space figures for pages that became
+	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
+	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+	 * space values to upper FSM layers; later inserters try to use such pages
+	 * only to find out that they are unusable.  This can cause long stalls
+	 * when there are many such pages.
+	 *
+	 * Forestall those problems by updating FSM's idea about a page that is
+	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
+	 *
+	 * We want to avoid holding an exclusive lock on the heap buffer while
+	 * doing IO (either of the FSM or the VM), so we'll release the lock on
+	 * the heap buffer before doing either.
 	 */
 	if (BufferIsValid(buffer))
 	{
-		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
-						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
+		if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+							XLHP_HAS_DEAD_ITEMS |
+							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+			vmflags & VISIBILITYMAP_VALID_BITS)
 		{
 			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
 
@@ -168,6 +245,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		else
 			UnlockReleaseBuffer(buffer);
 	}
+
+	/*
+	 * Read and update the VM block. Even if we skipped updating the heap page
+	 * due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * Note that it is *only* okay that we do not hold a lock on the heap page
+	 * because we are in recovery and can expect no other writers to clear
+	 * PD_ALL_VISIBLE before we are able to update the VM.
+	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		uint8		old_vmbits = 0;
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		visibilitymap_pin(reln, blkno, &vmbuffer);
+		old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a8025889be0..d9ba0f96e34 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -2045,12 +2047,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer,
+						  uint8 vmflags,
+						  bool set_pd_all_vis,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
 						  PruneReason reason,
@@ -2062,6 +2075,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2084,19 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
 	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2136,6 +2165,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Prepare the main xl_heap_prune record.  We already set the XLHP_HAS_*
 	 * flag above.
 	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		xlrec.flags |= XLHP_HAS_VMFLAGS;
 	if (RelationIsAccessibleInLogicalDecoding(relation))
 		xlrec.flags |= XLHP_IS_CATALOG_REL;
 	if (TransactionIdIsValid(conflict_xid))
@@ -2150,6 +2181,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	XLogRegisterData(&xlrec, SizeOfHeapPrune);
 	if (TransactionIdIsValid(conflict_xid))
 		XLogRegisterData(&conflict_xid, sizeof(TransactionId));
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterData(&vmflags, sizeof(uint8));
 
 	switch (reason)
 	{
@@ -2168,5 +2201,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+
+	/*
+	 * If pruning or freezing tuples or setting the page all-visible when
+	 * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+	 * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+	 * but this is deemed okay for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+		PageSetLSN(BufferGetPage(buffer), recptr);
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a62a93eee5..460cdbd8417 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,11 +464,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
+static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+												   TransactionId OldestXmin,
+												   OffsetNumber *deadoffsets,
+												   int allowed_num_offsets,
+												   bool *all_frozen,
+												   TransactionId *visibility_cutoff_xid,
+												   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2847,8 +2849,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
+	bool		set_pd_all_vis = false;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2859,6 +2864,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
+											   vacrel->cutoffs.OldestXmin,
+											   deadoffsets, num_offsets,
+											   &all_frozen, &visibility_cutoff_xid,
+											   &vacrel->offnum))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2878,6 +2897,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
+		Assert(!PageIsAllVisible(page));
+		set_pd_all_vis = true;
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		PageSetAllVisible(page);
+		visibilitymap_set_vmbyte(vacrel->rel,
+								 blkno,
+								 vmbuffer, vmflags);
+		conflict_xid = visibility_cutoff_xid;
+	}
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2887,7 +2918,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  vmbuffer,
+								  vmflags,
+								  set_pd_all_vis,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
@@ -2896,39 +2930,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
-	 */
 	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
-	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
-								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  flags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
@@ -3594,6 +3601,25 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
+/*
+ * Wrapper for heap_page_is_all_visible_except_lpdead() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+												  NULL, 0,
+												  all_frozen,
+												  visibility_cutoff_xid,
+												  logging_offnum);
+}
+
 /*
  * Check if every tuple in the given page is visible to all current and future
  * transactions.
@@ -3607,23 +3633,35 @@ dead_items_cleanup(LVRelState *vacrel)
  * visible tuples. Sets *all_frozen to true if every tuple on this page is
  * frozen.
  *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * deadoffsets are the offsets we know about and are about to set LP_UNUSED.
+ * allowed_num_offsets is the number of those. As long as the LP_DEAD items we
+ * encounter on the page match those exactly, we can set the page all-visible
+ * in the VM.
+ *
+ * Callers looking to verify that the page is all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
+heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+									   TransactionId OldestXmin,
+									   OffsetNumber *deadoffsets,
+									   int allowed_num_offsets,
+									   bool *all_frozen,
+									   TransactionId *visibility_cutoff_xid,
+									   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	OffsetNumber current_dead_offsets[MaxHeapTuplesPerPage];
+	size_t		current_num_offsets = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
@@ -3655,9 +3693,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			current_dead_offsets[current_num_offsets++] = offnum;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
@@ -3724,7 +3761,23 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
-	return all_visible;
+	/* If we already know it's not all-visible, return false */
+	if (!all_visible)
+		return false;
+
+	/* If we weren't allowed any dead offsets, we're done */
+	if (allowed_num_offsets == 0)
+		return current_num_offsets == 0;
+
+	/* If the number of dead offsets has changed, that's wrong */
+	if (current_num_offsets != allowed_num_offsets)
+		return false;
+
+	Assert(deadoffsets);
+
+	/* The dead offsets must be the same dead offsets */
+	return memcmp(current_dead_offsets, deadoffsets,
+				  allowed_num_offsets * sizeof(OffsetNumber)) == 0;
 }
 
 /*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..d6c86ccac20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -266,6 +266,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 {
 	char	   *rec = XLogRecGetData(record);
 	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+	char	   *maindataptr = rec + SizeOfHeapPrune;
 
 	info &= XLOG_HEAP_OPMASK;
 	if (info == XLOG_HEAP2_PRUNE_ON_ACCESS ||
@@ -278,7 +279,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		{
 			TransactionId conflict_xid;
 
-			memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
+			memcpy(&conflict_xid, maindataptr, sizeof(TransactionId));
+			maindataptr += sizeof(TransactionId);
 
 			appendStringInfo(buf, "snapshotConflictHorizon: %u",
 							 conflict_xid);
@@ -287,6 +289,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & XLHP_HAS_VMFLAGS)
+		{
+			uint8		vmflags;
+
+			memcpy(&vmflags, maindataptr, sizeof(uint8));
+			maindataptr += sizeof(uint8);
+			appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+		}
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
 										   Buffer buffer);
 extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer,
+									  uint8 vmflags,
+									  bool vm_modified_heap_page,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
 									  PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..ceae9c083ff 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -295,6 +295,9 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
 
+/* If the record should update the VM, this is the new value */
+#define		XLHP_HAS_VMFLAGS			(1 << 0)
+
 /* to handle recovery conflict during logical decoding on standby */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
-- 
2.43.0



  [text/x-patch] v6-0005-Combine-lazy_scan_prune-VM-corruption-cases.patch (7.1K, 4-v6-0005-Combine-lazy_scan_prune-VM-corruption-cases.patch)
  download | inline diff:
From b91503ceb9a923d922da88f13282a860916a9882 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v6 05/20] Combine lazy_scan_prune VM corruption cases

lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.

Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.

This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
 src/backend/access/heap/vacuumlazy.c | 114 +++++++++++++++++----------
 1 file changed, 73 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d9e195269d2..04a7b6c4181 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,6 +431,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1933,6 +1939,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
+
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2079,9 +2145,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * all_frozen variables. Start by looking for any VM corruption.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	{
+		/* Don't update the VM if we just cleared corruption in it */
+	}
+	else if (!all_visible_according_to_vm && presult.all_visible)
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2133,45 +2204,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-- 
2.43.0



  [text/x-patch] v6-0002-Make-heap_page_is_all_visible-independent-of-LVRe.patch (5.1K, 5-v6-0002-Make-heap_page_is_all_visible-independent-of-LVRe.patch)
  download | inline diff:
From fc11a1942761e1d9f84b805c57333dddede5aa83 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v6 02/20] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need two parameters from the
LVRelState, so just pass those in explicitly.
---
 src/backend/access/heap/vacuumlazy.c | 45 ++++++++++++++++++----------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 14036c27e87..8a62a93eee5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,8 +464,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2010,8 +2013,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2907,8 +2911,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * emitted.
 	 */
 	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
@@ -3592,9 +3596,16 @@ dead_items_cleanup(LVRelState *vacrel)
 
 /*
  * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * Return the visibility_cutoff_xid which is the highest xmin amongst the
+ * visible tuples. Sets *all_frozen to true if every tuple on this page is
+ * frozen.
  *
  * This is a stripped down version of lazy_scan_prune().  If you change
  * anything here, make sure that everything stays in sync.  Note that an
@@ -3602,9 +3613,11 @@ dead_items_cleanup(LVRelState *vacrel)
  * introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3627,7 +3640,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3651,9 +3664,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
 										 buf))
 		{
 			case HEAPTUPLE_LIVE:
@@ -3674,7 +3687,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+											   OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3709,7 +3722,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
-- 
2.43.0



  [text/x-patch] v6-0004-Use-xl_heap_prune-record-for-setting-empty-pages-.patch (5.8K, 6-v6-0004-Use-xl_heap_prune-record-for-setting-empty-pages-.patch)
  download | inline diff:
From affe7d96f42d36b5f68bea81dbcf08b44648181b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v6 04/20] Use xl_heap_prune record for setting empty pages
 all-visible

As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
 src/backend/access/heap/pruneheap.c  | 14 +++++--
 src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++----------
 src/include/access/heapam.h          |  1 +
 3 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d9ba0f96e34..97e51f78854 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  false,
 									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
@@ -2051,6 +2052,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
  * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
  * the page LSN when checksums/wal_log_hints are enabled even if we did not
  * prune or freeze tuples on the page.
@@ -2061,6 +2065,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  bool force_heap_fpi,
 						  Buffer vmbuffer,
 						  uint8 vmflags,
 						  bool set_pd_all_vis,
@@ -2089,13 +2094,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlrec.flags = 0;
 	regbuf_flags = REGBUF_STANDARD;
 
+	if (force_heap_fpi)
+		regbuf_flags |= REGBUF_FORCE_IMAGE;
+
 	/*
 	 * We can avoid an FPI if the only modification we are making to the heap
 	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+	else if (!do_prune &&
+			 nfrozen == 0 &&
+			 (!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 460cdbd8417..d9e195269d2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1878,33 +1878,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+				VISIBILITYMAP_ALL_FROZEN;
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
-			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				PageGetLSN(page) == InvalidXLogRecPtr)
-				log_newpage_buffer(buf, true);
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			visibilitymap_set_vmbyte(vacrel->rel, blkno,
+									 vmbuffer, new_vmbits);
+
+			if (RelationNeedsWAL(vacrel->rel))
+			{
+				/*
+				 * It's possible that another backend has extended the heap,
+				 * initialized the page, and then failed to WAL-log the page
+				 * due to an ERROR.  Since heap extension is not WAL-logged,
+				 * recovery might try to replay our record setting the page
+				 * all-visible and find that the page isn't initialized, which
+				 * will cause a PANIC. To prevent that, if the page hasn't
+				 * been previously WAL-logged, force a heap FPI.
+				 */
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  PageGetLSN(page) == InvalidXLogRecPtr,
+										  vmbuffer,
+										  new_vmbits,
+										  true,
+										  InvalidTransactionId,
+										  false, PRUNE_VACUUM_SCAN,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+			}
 
-			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
-			/* Count the newly all-frozen pages for logging */
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+			/* Count the newly all-frozen pages for logging. */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
 		}
@@ -2918,6 +2932,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
+								  false,
 								  vmbuffer,
 								  vmflags,
 								  set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  bool force_heap_fpi,
 									  Buffer vmbuffer,
 									  uint8 vmflags,
 									  bool vm_modified_heap_page,
-- 
2.43.0



  [text/x-patch] v6-0007-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch (11.7K, 7-v6-0007-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch)
  download | inline diff:
From 423b44273997b9436123bac012fc6cdb78cea824 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v6 07/20] Find and fix VM corruption in
 heap_page_prune_and_freeze

Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
 src/backend/access/heap/pruneheap.c  | 87 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 77 +++---------------------
 src/include/access/heapam.h          |  4 ++
 3 files changed, 96 insertions(+), 72 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 97e51f78854..496b70e318f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
+
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+			heap_page_prune_and_freeze(relation, buffer, false,
+									   InvalidBuffer,
+									   vistest, 0,
 									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
  * (see heap_prune_satisfies_vacuum).
  *
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   bool blk_known_av,
+						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
 						   int options,
 						   struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	/*
+	 * Clear any VM corruption. This does not need to be done in a critical
+	 * section.
+	 */
+	presult->vm_corruption = false;
+	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+																blockno, buffer, page,
+																blk_known_av,
+																prstate.lpdead_items, vmbuffer);
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f6cdd9e6828..0c121fdf4e6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,12 +431,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
-										   BlockNumber heap_blk,
-										   Buffer heap_buffer, Page heap_page,
-										   bool heap_blk_known_av,
-										   int64 nlpdead_items,
-										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1939,65 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
-							   BlockNumber heap_blk,
-							   Buffer heap_buffer, Page heap_page,
-							   bool heap_blk_known_av,
-							   int64 nlpdead_items,
-							   Buffer vmbuffer)
-{
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
-		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	return false;
-}
-
 
 /* qsort comparator for sorting OffsetNumbers */
 static int
@@ -2056,11 +1991,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+	heap_page_prune_and_freeze(rel, buf,
+							   all_visible_according_to_vm,
+							   vmbuffer,
+							   vacrel->vistest, prune_options,
 							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2145,10 +2083,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables. Start by looking for any VM corruption.
+	 * all_frozen variables.
 	 */
-	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
-									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	if (presult.vm_corruption)
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
+	bool		vm_corruption;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   bool blk_known_av,
+									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
 									   int options,
 									   struct VacuumCutoffs *cutoffs,
-- 
2.43.0



  [text/x-patch] v6-0008-Keep-all_frozen-updated-too-in-heap_page_prune_an.patch (3.0K, 8-v6-0008-Keep-all_frozen-updated-too-in-heap_page_prune_an.patch)
  download | inline diff:
From f1169a90d2dea593f4ce565d8311a6cd23157208 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v6 08/20] Keep all_frozen updated too in
 heap_page_prune_and_freeze

We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
---
 src/backend/access/heap/pruneheap.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 496b70e318f..6c3653e776c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
 	 * whether to freeze the page or not.  The all_visible and all_frozen
 	 * values returned to the caller are adjusted to include LP_DEAD items at
 	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -824,6 +820,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -1468,7 +1465,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1490,7 +1487,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1503,7 +1500,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1522,7 +1519,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1540,7 +1537,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
-- 
2.43.0



  [text/x-patch] v6-0006-Combine-vacuum-phase-I-VM-update-cases.patch (5.8K, 9-v6-0006-Combine-vacuum-phase-I-VM-update-cases.patch)
  download | inline diff:
From 8416676eacfad2cfce34279f3edd1b280d1291b3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v6 06/20] Combine vacuum phase I VM update cases

We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.

Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.

The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
 src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
 1 file changed, 32 insertions(+), 69 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 04a7b6c4181..f6cdd9e6828 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2152,11 +2152,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
-	else if (!all_visible_according_to_vm && presult.all_visible)
+
+	/*
+	 * If the page isn't yet marked all-visible in the VM or it is and needs
+	 * to me marked all-frozen, update the VM Note that all_frozen is only
+	 * valid if all_visible is true, so we must check both all_visible and
+	 * all_frozen.
+	 */
+	else if (presult.all_visible &&
+			 (!all_visible_according_to_vm ||
+			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
+		/*
+		 * If the page is all-frozen, we can pass InvalidTransactionId as our
+		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+		 * everything safe for REDO was logged when the page's tuples were
+		 * frozen.
+		 */
 		if (presult.all_frozen)
 		{
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2169,21 +2184,29 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * If the heap page is all-visible but the VM bit is not set, we don't
+		 * need to dirty the heap page.  However, if checksums are enabled, we
+		 * do need to make sure that the heap page is dirtied before passing
+		 * it to visibilitymap_set(), because it may be logged.
 		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
+		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
 									   flags);
 
 		/*
+		 * Even if we are only setting the all-frozen bit, there is a small
+		 * chance that the VM was modified sometime between setting
+		 * all_visible_according_to_vm and checking the visibility during
+		 * pruning. Check the return value of old_vmbits to ensure the
+		 * visibility map counters used for logging are accurate.
+		 *
 		 * If the page wasn't already set all-visible and/or all-frozen in the
 		 * VM, count it as newly set for logging.
 		 */
@@ -2204,66 +2227,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v6-0009-Update-VM-in-pruneheap.c.patch (12.7K, 10-v6-0009-Update-VM-in-pruneheap.c.patch)
  download | inline diff:
From 952f9aa12924868d98951ca621d09e7aefc23b81 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v6 09/20] Update VM in pruneheap.c

As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
 src/backend/access/heap/pruneheap.c  | 99 +++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 99 +++++-----------------------
 src/include/access/heapam.h          | 15 +++--
 3 files changed, 106 insertions(+), 107 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6c3653e776c..05227ce0339 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -360,7 +360,8 @@ identify_and_fix_vm_corruption(Relation relation,
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -436,6 +437,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint;
+	uint8		vmflags = 0;
+	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -936,7 +939,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 *
 	 * Now that freezing has been finalized, unset all_visible if there are
 	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
+	 * of the page, as expected for updating the visibility map.
 	 */
 	if (prstate.all_visible && prstate.lpdead_items == 0)
 	{
@@ -952,31 +955,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->hastup = prstate.hastup;
 
 	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so the VM update record doesn't need it.
 	 */
 	if (presult->all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
 		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
 
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
 	/*
-	 * Clear any VM corruption. This does not need to be done in a critical
-	 * section.
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
 	 */
-	presult->vm_corruption = false;
 	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
-		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
-																blockno, buffer, page,
-																blk_known_av,
-																prstate.lpdead_items, vmbuffer);
+	{
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av,
+										   prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/*
+		 * If the page isn't yet marked all-visible in the VM or it is and
+		 * needs to me marked all-frozen, update the VM Note that all_frozen
+		 * is only valid if all_visible is true, so we must check both
+		 * all_visible and all_frozen.
+		 */
+		else if (presult->all_visible &&
+				 (!blk_known_av ||
+				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			Assert(prstate.lpdead_items == 0);
+			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			/*
+			 * If the page is all-frozen, we can pass InvalidTransactionId as
+			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+			 * make everything safe for REDO was logged when the page's tuples
+			 * were frozen.
+			 */
+			if (presult->all_frozen)
+			{
+				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			}
+
+			/*
+			 * It's possible for the VM bit to be clear and the page-level bit
+			 * to be set if checksums are not enabled.
+			 *
+			 * And even if we are just planning to update the frozen bit in
+			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
+			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+			 * might have become stale.
+			 *
+			 * If the heap page is all-visible but the VM bit is not set, we
+			 * don't need to dirty the heap page.  However, if checksums are
+			 * enabled, we do need to make sure that the heap page is dirtied
+			 * before passing it to visibilitymap_set(), because it may be
+			 * logged.
+			 */
+			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+			{
+				PageSetAllVisible(page);
+				MarkBufferDirty(buffer);
+			}
+
+			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+										   vmbuffer, presult->vm_conflict_horizon,
+										   vmflags);
+		}
+	}
+
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
+
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = vmflags;
+
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0c121fdf4e6..c49e81bc5dd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1933,7 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -1949,7 +1948,8 @@ cmpOffsetNumbers(const void *a, const void *b)
  * vmbuffer is the buffer containing the VM block with visibility information
  * for the heap block, blkno. all_visible_according_to_vm is the saved
  * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1978,6 +1978,7 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
+	 * Then, if the page's visibility status has changed, update the VM.
 	 *
 	 * If the relation has no indexes, we can immediately mark would-be dead
 	 * items LP_UNUSED.
@@ -1986,10 +1987,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * presult.ndeleted.  It should not be confused with presult.lpdead_items;
 	 * presult.lpdead_items's final value can be thought of as the number of
 	 * tuples that were deleted from indexes.
-	 *
-	 * We will update the VM after collecting LP_DEAD items and freezing
-	 * tuples. Pruning will have determined whether or not the page is
-	 * all-visible.
 	 */
 	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
@@ -2081,88 +2078,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (presult.vm_corruption)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		/* Don't update the VM if we just cleared corruption in it */
-	}
-
-	/*
-	 * If the page isn't yet marked all-visible in the VM or it is and needs
-	 * to me marked all-frozen, update the VM Note that all_frozen is only
-	 * valid if all_visible is true, so we must check both all_visible and
-	 * all_frozen.
-	 */
-	else if (presult.all_visible &&
-			 (!all_visible_according_to_vm ||
-			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		/*
-		 * If the page is all-frozen, we can pass InvalidTransactionId as our
-		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
-		 * everything safe for REDO was logged when the page's tuples were
-		 * frozen.
-		 */
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * If the heap page is all-visible but the VM bit is not set, we don't
-		 * need to dirty the heap page.  However, if checksums are enabled, we
-		 * do need to make sure that the heap page is dirtied before passing
-		 * it to visibilitymap_set(), because it may be logged.
-		 */
-		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * Even if we are only setting the all-frozen bit, there is a small
-		 * chance that the VM was modified sometime between setting
-		 * all_visible_according_to_vm and checking the visibility during
-		 * pruning. Check the return value of old_vmbits to ensure the
-		 * visibility map counters used for logging are accurate.
-		 *
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
+	}
 
 	return presult.ndeleted;
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
+	 * all_visible and all_frozen indicate the status of the page as reflected
+	 * in the visibility map after pruning, freezing, and setting any pages
+	 * all-visible in the visibility map.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * vm_conflict_horizon is the newest xmin of live tuples on the page
+	 * (older than OldestXmin).  It will only be valid if we did not set the
+	 * page all-frozen in the VM.
 	 *
 	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
-	bool		vm_corruption;
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
-- 
2.43.0



  [text/x-patch] v6-0010-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch (28.5K, 11-v6-0010-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch)
  download | inline diff:
From 64f09710ba6c738870217aa7fcd34e50bd52b93e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v6 10/20] Eliminate xl_heap_visible from vacuum phase I
 prune/freeze

Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.

This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
 src/backend/access/heap/pruneheap.c  | 454 ++++++++++++++++-----------
 src/backend/access/heap/vacuumlazy.c |  30 --
 src/include/access/heapam.h          |  15 +-
 3 files changed, 278 insertions(+), 221 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 05227ce0339..cf9e5215d6b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		freeze;
+
+	/*
+	 * Whether or not to consider updating the VM. There is some bookkeeping
+	 * that must be maintained if we would like to update the VM.
+	 */
+	bool		consider_update_vm;
+
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
 	 *
 	 * These fields are not used by pruning itself for the most part, but are
 	 * used to collect information about what was pruned and what state the
-	 * page is in after pruning, for the benefit of the caller.  They are
-	 * copied to the caller's PruneFreezeResult at the end.
+	 * page is in after pruning to use when updating the visibility map and
+	 * for the benefit of the caller.  They are copied to the caller's
+	 * PruneFreezeResult at the end.
 	 * -------------------------------------------------------
 	 */
 
@@ -138,11 +146,10 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+	 * directly before updating the VM. We ignore LP_DEAD items when deciding
+	 * whether or not to opportunistically freeze and when determining the
+	 * snapshot conflict horizon required when freezing tuples.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -371,12 +378,15 @@ identify_and_fix_vm_corruption(Relation relation,
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
  *
  * blk_known_av is the visibility status of the heap block as of the last call
  * to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -392,6 +402,8 @@ identify_and_fix_vm_corruption(Relation relation,
  *   FREEZE indicates that we will also freeze tuples, and will return
  *   'all_visible', 'all_frozen' flags to the caller.
  *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -436,18 +448,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	HeapTupleData tup;
 	bool		do_freeze;
 	bool		do_prune;
-	bool		do_hint;
+	bool		do_hint_full_or_prunable;
+	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	bool		all_frozen_except_lp_dead = false;
+	bool		set_pd_all_visible = false;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
+	Assert(!prstate.consider_update_vm || vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -492,50 +510,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Keep track of whether or not the page will be all-visible and
+	 * all-frozen for use in opportunistic freezing and to update the VM if
+	 * the caller requests it.
+	 *
+	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
+	 * other callers could do either one. The visibility bookkeeping is
+	 * required for opportunistic freezing (in addition to setting the VM
+	 * bits) because we only consider opportunistically freezing tuples if the
+	 * whole page would become all-frozen or if the whole page will be frozen
+	 * except for dead tuples that will be removed by vacuum.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * If only updating the VM, we must initialize all_frozen to false, as
+	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
+	 * page and we will not end up correctly setting it to false later.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * Dead tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing, so we do not clear
+	 * all_visible when we see LP_DEAD items. We fix that after determining
+	 * whether or not to freeze but before deciding whether or not to update
+	 * the VM so that we don't set the VM bit incorrectly.
+	 *
+	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * bookkeeping. Initializing all_visible to false allows skipping the work
+	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
 	if (prstate.freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if (prstate.consider_update_vm)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon.
+	 * This is most likely to happen when updating the VM and/or freezing all
+	 * live tuples on the page. It is updated before returning to the caller
+	 * because vacuum does assert-build only validation on the page using this
+	 * field.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -733,10 +758,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
+	 * pd_prune_xid field or the page was marked full, we will update those
+	 * hint bits.
 	 */
-	do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_full_or_prunable =
+		((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
 	/*
@@ -784,7 +810,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 						if (XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
 					}
-					else if (do_hint)
+					else if (do_hint_full_or_prunable)
 					{
 						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
@@ -823,11 +849,84 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	all_frozen_except_lp_dead = prstate.all_frozen;
+	if (prstate.lpdead_items > 0)
+	{
+		prstate.all_visible = false;
+		prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
+	 */
+	if (prstate.consider_update_vm)
+	{
+		/*
+		 * Clear any VM corruption. This does not need to be in a critical
+		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+		 * we may mark the heap page buffer dirty here and could end up doing
+		 * so again later. This is not a correctness issue and is in the path
+		 * of VM corruption, so we don't have to worry about the extra
+		 * performance overhead.
+		 */
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av, prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/* Determine if we actually need to set the VM and which bits to set. */
+		else if (prstate.all_visible &&
+				 (!blk_known_av ||
+				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+			if (prstate.all_frozen)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+	}
+
+	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+	/* Save these for the caller in case we later zero out vmflags */
+	presult->new_vmbits = vmflags;
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
-	if (do_hint)
+	if (do_hint_full_or_prunable)
 	{
 		/*
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
@@ -843,15 +942,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageClearFull(page);
 
 		/*
-		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+		 * this is a non-WAL-logged hint.  If we are going to freeze or prune
+		 * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+		 * dirty and emit WAL below.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_prune && !do_freeze && !set_pd_all_visible)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -865,12 +965,47 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (set_pd_all_visible)
+			PageSetAllVisible(page);
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+		 * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+		 * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+		 * appropriate here.
 		 */
-		if (RelationNeedsWAL(relation))
+		if (do_prune || do_freeze || set_pd_all_visible)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
+												  vmbuffer, vmflags);
+
+			if (old_vmbits == vmflags)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				do_set_vm = false;
+				/* 0 out vmflags so we don't emit WAL to update the VM */
+				vmflags = 0;
+			}
+		}
+
+		/*
+		 * It should never be the case that PD_ALL_VISIBLE is not set and the
+		 * VM is set. Or, if it were, we should have caught it earlier when
+		 * finding and fixing VM corruption. So, if we found out the VM was
+		 * already set above, we should have found PD_ALL_VISIBLE set earlier.
+		 */
+		Assert(!set_pd_all_visible || do_set_vm);
+
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+		 * we were only updating the VM and it turns out it was already set,
+		 * we will have unset do_set_vm earlier. As such, check it again
+		 * before emitting the record.
+		 */
+		if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -882,35 +1017,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
-			TransactionId conflict_xid;
+			TransactionId conflict_xid = InvalidTransactionId;
 
 			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
+			 * If we are updating the VM, the conflict horizon is almost
+			 * always the visibility cutoff XID.
+			 *
+			 * Separately, if we are freezing any tuples, as an optimization,
+			 * we can use the visibility_cutoff_xid as the conflict horizon if
+			 * the page will be all-frozen. This is true even if there are
+			 * LP_DEAD line pointers because we ignored those when maintaining
+			 * the visibility_cutoff_xid.
 			 */
-			if (do_freeze)
+			if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+				conflict_xid = prstate.visibility_cutoff_xid;
+
+			/*
+			 * Otherwise, if we are freezing but the page would not be
+			 * all-frozen, we have to use the more pessimistic horizon of
+			 * OldestXmin, which may be newer than the newest tuple we froze.
+			 * We currently don't track the newest tuple we froze.
+			 */
+			else if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
+				conflict_xid = prstate.cutoffs->OldestXmin;
+				TransactionIdRetreat(conflict_xid);
 			}
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
+			/*
+			 * If we are removing tuples with a younger xmax than our so far
+			 * calculated conflict_xid, we must use this as our horizon.
+			 */
+			if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
 				conflict_xid = prstate.latest_xid_removed;
 
+			/*
+			 * We can omit the snapshot conflict horizon if we are not pruning
+			 * or freezing any tuples and are setting an already all-visible
+			 * page all-frozen in the VM. In this case, all of the tuples on
+			 * the page must already be visible to all MVCC snapshots on the
+			 * standby.
+			 */
+			if (!do_prune && !do_freeze && do_set_vm &&
+				blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+				conflict_xid = InvalidTransactionId;
+
 			log_heap_prune_and_freeze(relation, buffer,
 									  false,
-									  InvalidBuffer, 0, false,
+									  vmbuffer,
+									  vmflags,
+									  set_pd_all_visible,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -922,124 +1078,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected for updating the visibility map.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
-	presult->hastup = prstate.hastup;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so the VM update record doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * VACUUM will call heap_page_is_all_visible() during the second pass over
+	 * the heap to determine all_visible and all_frozen for the page -- this
+	 * is a specialized version of the logic from this function.  Now that
+	 * we've finished pruning and freezing, make sure that we're in total
+	 * agreement with heap_page_is_all_visible() using an assertion. We will
+	 * have already set the page in the VM, so this assertion will only let
+	 * you know that you've already done something wrong.
 	 */
-	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
 	{
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av,
-										   prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		/*
-		 * If the page isn't yet marked all-visible in the VM or it is and
-		 * needs to me marked all-frozen, update the VM Note that all_frozen
-		 * is only valid if all_visible is true, so we must check both
-		 * all_visible and all_frozen.
-		 */
-		else if (presult->all_visible &&
-				 (!blk_known_av ||
-				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			Assert(prstate.lpdead_items == 0);
-			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		Assert(cutoffs);
 
-			/*
-			 * If the page is all-frozen, we can pass InvalidTransactionId as
-			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
-			 * make everything safe for REDO was logged when the page's tuples
-			 * were frozen.
-			 */
-			if (presult->all_frozen)
-			{
-				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			}
+		Assert(prstate.lpdead_items == 0);
 
-			/*
-			 * It's possible for the VM bit to be clear and the page-level bit
-			 * to be set if checksums are not enabled.
-			 *
-			 * And even if we are just planning to update the frozen bit in
-			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
-			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
-			 * might have become stale.
-			 *
-			 * If the heap page is all-visible but the VM bit is not set, we
-			 * don't need to dirty the heap page.  However, if checksums are
-			 * enabled, we do need to make sure that the heap page is dirtied
-			 * before passing it to visibilitymap_set(), because it may be
-			 * logged.
-			 */
-			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buffer);
-			}
+		if (!heap_page_is_all_visible(relation, buffer,
+									  cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
 
-			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
-										   vmbuffer, presult->vm_conflict_horizon,
-										   vmflags);
-		}
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
+#endif
 
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->old_vmbits = old_vmbits;
+	/* new_vmbits was set above */
+	presult->hastup = prstate.hastup;
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = vmflags;
-
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
@@ -1621,7 +1708,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			break;
 	}
 
-	/* Consider freezing any normal tuples which will not be removed */
+	/*
+	 * Consider freezing any normal tuples which will not be removed.
+	 * Regardless of whether or not we want to freeze the tuples, if we want
+	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+	 * tuple to know whether or not the page will be totally frozen.
+	 */
 	if (prstate->freeze)
 	{
 		bool		totally_frozen;
@@ -2234,6 +2326,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
 	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
+	Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
 	xlrec.flags = 0;
 	regbuf_flags = REGBUF_STANDARD;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c49e81bc5dd..91e209901b8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2014,34 +2014,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2075,8 +2047,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate the status of the page as reflected
-	 * in the visibility map after pruning, freezing, and setting any pages
-	 * all-visible in the visibility map.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page
-	 * (older than OldestXmin).  It will only be valid if we did not set the
-	 * page all-frozen in the VM.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
 	uint8		old_vmbits;
 	uint8		new_vmbits;
 
-- 
2.43.0



  [text/x-patch] v6-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patch (7.1K, 12-v6-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patch)
  download | inline diff:
From 099097a6c0886bd7ac284ba4de6f26fde6f4fb5e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v6 13/20] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()

Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 14 +++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 13 ++++++-------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ffc12314b41..715dfc16ba7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -574,9 +574,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1172,11 +1172,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2678f7ab782..4b8e5747239 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index bf987aed8d3..508bb379f87 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisible(state, fxid);
 }
 
 /*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisible(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index d346be71642..aec0692b5db 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -97,8 +97,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v6-0014-Use-GlobalVisState-to-determine-page-level-visibi.patch (10.5K, 13-v6-0014-Use-GlobalVisState-to-determine-page-level-visibi.patch)
  download | inline diff:
From 84467961e150272593c85cdbc732d1311dd8ae74 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v6 14/20] Use GlobalVisState to determine page level
 visibility

During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.

It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.

Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.

Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
 src/backend/access/heap/pruneheap.c         | 48 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 17 ++++----
 src/include/access/heapam.h                 |  4 +-
 4 files changed, 59 insertions(+), 38 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 715dfc16ba7..ab79d8a3ed9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon, when setting the VM or when
+	 * freezing all the live tuples on the page.
 	 *
 	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
 	 * directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -553,14 +552,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon.
-	 * This is most likely to happen when updating the VM and/or freezing all
-	 * live tuples on the page. It is updated before returning to the caller
-	 * because vacuum does assert-build only validation on the page using this
-	 * field.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState.
+	 *
+	 * If we encounter an uncommitted tuple, this field is unmaintained. If
+	 * the page is being set all-visible or when freezing all live tuples on
+	 * the page, it is used to calculate the snapshot conflict horizon.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -756,6 +753,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -1098,12 +1105,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(cutoffs);
-
 		Assert(prstate.lpdead_items == 0);
 
 		if (!heap_page_is_all_visible(relation, buffer,
-									  cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1628,19 +1633,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6a0fa371a06..777ec30eb82 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,7 +465,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
-												   TransactionId OldestXmin,
+												   GlobalVisState *vistest,
 												   OffsetNumber *deadoffsets,
 												   int allowed_num_offsets,
 												   bool *all_frozen,
@@ -2716,7 +2716,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 InvalidOffsetNumber);
 
 	if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
-											   vacrel->cutoffs.OldestXmin,
+											   vacrel->vistest,
 											   deadoffsets, num_offsets,
 											   &all_frozen, &visibility_cutoff_xid,
 											   &vacrel->offnum))
@@ -3459,13 +3459,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+	return heap_page_is_all_visible_except_lpdead(rel, buf, vistest,
 												  NULL, 0,
 												  all_frozen,
 												  visibility_cutoff_xid,
@@ -3500,7 +3500,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
-									   TransactionId OldestXmin,
+									   GlobalVisState *vistest,
 									   OffsetNumber *deadoffsets,
 									   int allowed_num_offsets,
 									   bool *all_frozen,
@@ -3555,8 +3555,8 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
 		tuple.t_len = ItemIdGetLength(itemid);
 		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
-										 buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+												  buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3575,8 +3575,7 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin,
-											   OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
 
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v6-0012-Remove-xl_heap_visible-entirely.patch (24.3K, 14-v6-0012-Remove-xl_heap_visible-entirely.patch)
  download | inline diff:
From a9341bda057d50769b6fbb109d847324ab837de9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v6 12/20] Remove xl_heap_visible entirely

There are now no users of this, so eliminate it entirely.
---
 src/backend/access/common/bufmask.c      |   3 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 154 +----------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  10 +-
 src/backend/access/heap/visibilitymap.c  | 106 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  11 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 30 insertions(+), 365 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 68db4325285..48f7b84156a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
 #include "access/valid.h"
 #include "access/visibilitymap.h"
 #include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
@@ -2512,11 +2513,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			visibilitymap_set_vmbyte(relation,
-									 BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set(relation,
+							  BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN);
 		}
 
 		/*
@@ -8784,49 +8785,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
 
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 14541e2e94f..64f06d46bf1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -82,10 +82,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		memcpy(&vmflags, maindataptr, sizeof(uint8));
 		maindataptr += sizeof(uint8);
 
-		/*
-		 * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
-		 * because we already have XLHP_IS_CATALOG_REL.
-		 */
 		Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
 		/* Must never set all_frozen bit without also setting all_visible bit */
 		Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
@@ -267,7 +263,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+		old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
 			PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -278,143 +274,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		UnlockReleaseBuffer(vmbuffer);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-		visibilitymap_pin(reln, blkno, &vmbuffer);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -791,16 +650,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		visibilitymap_set_vmbyte(reln, blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN);
 
 		/*
 		 * It is not possible that the VM was already set for this heap page,
 		 * so the vmbuffer must have been modified and marked dirty.
 		 */
 		Assert(BufferIsDirty(vmbuffer));
+		visibilitymap_set(reln, blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
 		PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		FreeFakeRelcacheEntry(reln);
 	}
@@ -1380,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 82127e8728b..ffc12314b41 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -979,8 +979,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_set_vm)
 		{
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
-												  vmbuffer, vmflags);
+			old_vmbits = visibilitymap_set(relation, blockno,
+										   vmbuffer, vmflags);
 
 			if (old_vmbits == vmflags)
 			{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 91e209901b8..6a0fa371a06 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1887,8 +1887,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 			MarkBufferDirty(buf);
 
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			visibilitymap_set_vmbyte(vacrel->rel, blkno,
-									 vmbuffer, new_vmbits);
+			visibilitymap_set(vacrel->rel, blkno,
+							  vmbuffer, new_vmbits);
 
 			if (RelationNeedsWAL(vacrel->rel))
 			{
@@ -2754,9 +2754,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		set_pd_all_vis = true;
 		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbyte(vacrel->rel,
-								 blkno,
-								 vmbuffer, vmflags);
+		visibilitymap_set(vacrel->rel,
+						  blkno,
+						  vmbuffer, vmflags);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 0bc64203959..5ed54e06dd4 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page and log
- *      visibilitymap_set_vmbyte - set a bit in a pinned page
+ *		visibilitymap_set	 - set a bit in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,105 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set flags in the VM block contained in the passed in vmBuf.
  *
@@ -338,8 +238,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * making any changes needed to the associated heap page.
  */
 uint8
-visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index d6c86ccac20..f7880a4ed81 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -351,13 +351,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -462,9 +455,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ceae9c083ff..a64677b7bca 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -438,20 +437,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -495,11 +480,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 977566f6b98..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..98b1adc4e9e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4274,7 +4274,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v6-0011-Rename-PruneState.freeze-to-attempt_freeze.patch (4.1K, 15-v6-0011-Rename-PruneState.freeze-to-attempt_freeze.patch)
  download | inline diff:
From 07b7bc0bccd41d93e92e2dee4f5a020dbf3e5b0c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v6 11/20] Rename PruneState.freeze to attempt_freeze

This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.

Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
 src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index cf9e5215d6b..82127e8728b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
 	/* whether or not dead items can be set LP_UNUSED during pruning */
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
-	bool		freeze;
+	bool		attempt_freeze;
 
 	/*
 	 * Whether or not to consider updating the VM. There is some bookkeeping
@@ -452,7 +452,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
-	bool		hint_bit_fpi;
+	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	bool		all_frozen_except_lp_dead = false;
 	bool		set_pd_all_visible = false;
@@ -460,7 +460,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
 	prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
@@ -485,7 +485,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/* initialize page freezing working state */
 	prstate.pagefrz.freeze_required = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
 		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -535,7 +535,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * bookkeeping. Initializing all_visible to false allows skipping the work
 	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -653,7 +653,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
 	 * an FPI to be emitted.
 	 */
-	hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	/*
 	 * Process HOT chains.
@@ -770,7 +770,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * plans we prepared, or not.
 	 */
 	do_freeze = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (prstate.pagefrz.freeze_required)
 		{
@@ -803,7 +803,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				 */
 				if (RelationNeedsWAL(relation))
 				{
-					if (hint_bit_fpi)
+					if (did_tuple_hint_fpi)
 						do_freeze = true;
 					else if (do_prune)
 					{
@@ -1127,7 +1127,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
 		{
@@ -1714,7 +1714,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
 	 * tuple to know whether or not the page will be totally frozen.
 	 */
-	if (prstate->freeze)
+	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
 
-- 
2.43.0



  [text/x-patch] v6-0015-Inline-TransactionIdFollows-Precedes.patch (4.9K, 16-v6-0015-Inline-TransactionIdFollows-Precedes.patch)
  download | inline diff:
From 536d921a94bb3242583c97944e351b6f6a17d600 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v6 15/20] Inline TransactionIdFollows/Precedes()

Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
---
 src/backend/access/transam/transam.c | 64 -------------------------
 src/include/access/transam.h         | 70 ++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
-	/*
-	 * If either ID is a permanent XID then we can just do unsigned
-	 * comparison.  If both are normal, do a modulo-2^32 comparison.
-	 */
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 < id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 <= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 > id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 >= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff >= 0);
-}
-
 
 /*
  * TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
 } TransamVariablesData;
 
 
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+	/*
+	 * If either ID is a permanent XID then we can just do unsigned
+	 * comparison.  If both are normal, do a modulo-2^32 comparison.
+	 */
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 < id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 <= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 > id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 >= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff >= 0);
+}
+
+
 /* ----------------
  *		extern declarations
  * ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
 extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
-- 
2.43.0



  [text/x-patch] v6-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch (27.4K, 17-v6-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 17dabfb6dade53ab1a73272edc383ed482989329 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v6 17/20] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c              | 15 ++++-
 src/backend/access/heap/heapam_handler.c      | 15 ++++-
 src/backend/access/heap/pruneheap.c           | 63 ++++++++++++++-----
 src/backend/access/index/indexam.c            | 46 ++++++++++++++
 src/backend/access/table/tableam.c            | 39 ++++++++++--
 src/backend/executor/execMain.c               |  4 ++
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  7 ++-
 src/backend/executor/nodeIndexscan.c          | 18 ++++--
 src/backend/executor/nodeSeqscan.c            | 24 +++++--
 src/include/access/genam.h                    | 11 ++++
 src/include/access/heapam.h                   | 24 ++++++-
 src/include/access/relscan.h                  |  6 ++
 src/include/access/tableam.h                  | 30 ++++++++-
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  8 ++-
 16 files changed, 278 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 48f7b84156a..f90b014a9b0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1236,6 +1239,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1274,6 +1278,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1306,6 +1316,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cb4bc35c93e..c68283de6f2 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 80d055e5376..dad341cb265 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			int			options = 0;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				options = HEAP_PAGE_PRUNE_UPDATE_VM;
+			}
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * that during on-access pruning with the current implementation.
 			 */
 			heap_page_prune_and_freeze(relation, buffer, false,
-									   InvalidBuffer,
-									   vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+									   vmbuffer ? *vmbuffer : InvalidBuffer,
+									   vistest, options,
+									   NULL, &presult, PRUNE_ON_ACCESS,
+									   &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -513,12 +525,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * all-frozen for use in opportunistic freezing and to update the VM if
 	 * the caller requests it.
 	 *
-	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
-	 * other callers could do either one. The visibility bookkeeping is
-	 * required for opportunistic freezing (in addition to setting the VM
-	 * bits) because we only consider opportunistically freezing tuples if the
-	 * whole page would become all-frozen or if the whole page will be frozen
-	 * except for dead tuples that will be removed by vacuum.
+	 * Currently, only VACUUM attempts freezing. But other callers could. The
+	 * visibility bookkeeping is required for opportunistic freezing (in
+	 * addition to setting the VM bits) because we only consider
+	 * opportunistically freezing tuples if the whole page would become
+	 * all-frozen or if the whole page will be frozen except for dead tuples
+	 * that will be removed by vacuum. But if consider_update_vm is false,
+	 * we'll not set the VM even if the page is discovered to be all-visible.
+	 *
+	 * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+	 * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+	 * because we will not call heap_prepare_freeze_tuple() on each tuple.
 	 *
 	 * If only updating the VM, we must initialize all_frozen to false, as
 	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -530,7 +547,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * whether or not to freeze but before deciding whether or not to update
 	 * the VM so that we don't set the VM bit incorrectly.
 	 *
-	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * If not freezing and not updating the VM, we avoid the extra
 	 * bookkeeping. Initializing all_visible to false allows skipping the work
 	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
@@ -879,12 +896,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.all_frozen = false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate.consider_update_vm &&
+		prstate.all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate.consider_update_vm = false;
+		prstate.all_visible = prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * Handle setting visibility map bit based on information from the VM (if
+	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+	 * call), and from all_visible and all_frozen variables.
 	 */
 	if (prstate.consider_update_vm)
 	{
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 219df1971da..d803c307517 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -279,6 +279,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -610,6 +636,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0391798dd2c..065676eb7cf 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -917,6 +917,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index ed35c58c2c3..15e1853027b 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -362,6 +367,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -369,8 +375,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -400,8 +409,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..0e986d8ef72 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -876,6 +878,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -913,10 +934,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, struct ScanKeyData *key)
+				   int nkeys, struct ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1125,6 +1149,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e107d6e5f81..326d7d78860 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -680,6 +680,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index 921813483e3..5d0863a7933 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -9,6 +9,7 @@ use warnings FATAL => 'all';
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
+use Time::HiRes qw(usleep);
 
 if ($ENV{enable_injection_points} ne 'yes')
 {
@@ -295,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -744,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
@@ -754,12 +756,12 @@ wait_until_vacuum_can_remove(
 
 # message should not be issued
 ok( !$node_standby->log_contains(
-		"invalidating obsolete slot \"no_conflict_inactiveslot\"", $logstart),
+		"invalidating obsolete replication slot \"no_conflict_inactiveslot\"", $logstart),
 	'inactiveslot slot invalidation is not logged with vacuum on conflict_test'
 );
 
 ok( !$node_standby->log_contains(
-		"invalidating obsolete slot \"no_conflict_activeslot\"", $logstart),
+		"invalidating obsolete replication slot \"no_conflict_activeslot\"", $logstart),
 	'activeslot slot invalidation is not logged with vacuum on conflict_test'
 );
 
-- 
2.43.0



  [text/x-patch] v6-0019-Reorder-heap_page_prune_and_freeze-parameters.patch (5.8K, 18-v6-0019-Reorder-heap_page_prune_and_freeze-parameters.patch)
  download | inline diff:
From d0d3520b2ee064b93449dede1e8ff88b5dc35510 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v6 19/20] Reorder heap_page_prune_and_freeze parameters

Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
 src/backend/access/heap/pruneheap.c  | 38 ++++++++++++++--------------
 src/backend/access/heap/vacuumlazy.c |  6 ++---
 src/include/access/heapam.h          |  4 +--
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5d943b0c64f..20f4a62fb16 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, false,
+			heap_page_prune_and_freeze(relation, buffer, options, false,
 									   vmbuffer ? *vmbuffer : InvalidBuffer,
-									   vistest, options,
-									   NULL, &presult, PRUNE_ON_ACCESS,
+									   vistest,
+									   NULL, PRUNE_ON_ACCESS, &presult,
 									   &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -645,6 +645,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
+ * options:
+ *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ *   pruning.
+ *
+ *   FREEZE indicates that we will also freeze tuples, and will return
+ *   'all_visible', 'all_frozen' flags to the caller.
+ *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
@@ -663,30 +672,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * contain the required block of the visibility map.
  *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- *   UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
  *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed.  It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -699,13 +699,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   int options,
 						   bool blk_known_av,
 						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
-						   int options,
 						   struct VacuumCutoffs *cutoffs,
-						   PruneFreezeResult *presult,
 						   PruneReason reason,
+						   PruneFreezeResult *presult,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 777ec30eb82..120782fd8ec 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1992,11 +1992,11 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf,
+	heap_page_prune_and_freeze(rel, buf, prune_options,
 							   all_visible_according_to_vm,
 							   vmbuffer,
-							   vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+							   vacrel->vistest,
+							   &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
 								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   int options,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
-									   int options,
 									   struct VacuumCutoffs *cutoffs,
-									   PruneFreezeResult *presult,
 									   PruneReason reason,
+									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
-- 
2.43.0



  [text/x-patch] v6-0016-Unset-all-visible-sooner-if-not-freezing.patch (2.5K, 19-v6-0016-Unset-all-visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 2f2bfc3d3b436f460ae91e5cbdc8404063b90936 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v6 16/20] Unset all-visible sooner if not freezing

In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.

Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.

Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
 src/backend/access/heap/pruneheap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ab79d8a3ed9..80d055e5376 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1493,8 +1493,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1752,8 +1755,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v6-0018-Add-helper-functions-to-heap_page_prune_and_freez.patch (18.9K, 20-v6-0018-Add-helper-functions-to-heap_page_prune_and_freez.patch)
  download | inline diff:
From e80c4241826a58f212601798cd398c4a318a6511 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v6 18/20] Add helper functions to heap_page_prune_and_freeze

heap_page_prune_and_freeze() has gotten rather long. It has several
stages:

1) setup - where the PruneStateis set up
2) tuple examination -- where tuples and line pointers are examined to
   determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
   heuristics, and state gathered during stage 2 whether or not to
   freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged

This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.

For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
 src/backend/access/heap/pruneheap.c | 471 +++++++++++++++++-----------
 1 file changed, 295 insertions(+), 176 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dad341cb265..5d943b0c64f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+								  bool do_prune,
+								  bool do_hint_full_or_prunable,
+								  bool did_tuple_hint_fpi,
+								  PruneState *prstate,
+								  bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+									 Buffer buffer, BlockNumber blockno, Page page,
+									 PruneReason reason,
+									 bool do_prune, bool do_freeze,
+									 bool blk_known_av,
+									 PruneState *prstate,
+									 Buffer *vmbuffer, uint8 *vmflags,
+									 bool *set_pd_all_visible);
+
 static bool identify_and_fix_vm_corruption(Relation relation,
 										   BlockNumber heap_blk,
 										   Buffer heap_buffer, Page heap_page,
@@ -376,6 +392,249 @@ identify_and_fix_vm_corruption(Relation relation,
 	return false;
 }
 
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+						 Buffer buffer, BlockNumber blockno, Page page,
+						 PruneReason reason,
+						 bool do_prune, bool do_freeze,
+						 bool blk_known_av,
+						 PruneState *prstate,
+						 Buffer *vmbuffer, uint8 *vmflags,
+						 bool *set_pd_all_visible)
+{
+	bool		do_set_vm = false;
+
+	/*
+	 * If the caller specified not to update the VM, validate everything is in
+	 * the right state and exit.
+	 */
+	if (!prstate->consider_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		/* We don't set only the page level visibility hint */
+		Assert(!(*set_pd_all_visible));
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->consider_update_vm &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate->consider_update_vm = false;
+		prstate->all_visible = prstate->all_frozen = false;
+	}
+
+	Assert(!prstate->all_frozen || prstate->all_visible);
+
+	/*
+	 * Clear any VM corruption. This does not need to be in a critical
+	 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+	 * may mark the heap page buffer dirty here and could end up doing so
+	 * again later. This is not a correctness issue and is in the path of VM
+	 * corruption, so we don't have to worry about the extra performance
+	 * overhead.
+	 */
+	if (identify_and_fix_vm_corruption(relation,
+									   blockno, buffer, page,
+									   blk_known_av, prstate->lpdead_items,
+									   *vmbuffer))
+	{
+		/* If we fix corruption, don't update the VM further */
+	}
+
+	/* Determine if we actually need to set the VM and which bits to set. */
+	else if (prstate->all_visible &&
+			 (!blk_known_av ||
+			  (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+	{
+		*vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+	}
+
+	do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	*set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+	return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool do_prune,
+					  bool do_hint_full_or_prunable,
+					  bool did_tuple_hint_fpi,
+					  PruneState *prstate,
+					  bool *all_frozen_except_lp_dead)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and exit.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		Assert(!(*all_frozen_except_lp_dead));
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_full_or_prunable)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	*all_frozen_except_lp_dead = prstate->all_frozen;
+	if (prstate->lpdead_items > 0)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
+
+	return do_freeze;
+}
+
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page. If the page's visibility status has changed, update it in
@@ -766,20 +1025,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Clear the offset information once we have processed the given page. */
 	*off_loc = InvalidOffsetNumber;
 
-	do_prune = prstate.nredirected > 0 ||
-		prstate.ndead > 0 ||
-		prstate.nunused > 0;
-
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
-	 * all-visible.
+	 * all-visible. This must be done before we decide whether or not to
+	 * opportunistically freeze below because we do not want to
+	 * opportunistically freeze the page if there are live tuples not visible
+	 * to everyone, which would prevent setting the page frozen in the VM.
 	 */
 	if (prstate.all_visible &&
 		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
+	/*
+	 * Now decide based on information collected while examining every tuple
+	 * which actions to take. If there are any prunable tuples, we'll prune
+	 * them. However, we will decide based on options specified by the caller
+	 * and various heuristics whether or not to freeze any tuples and whether
+	 * or not the page should be set all-visible/all-frozen in the VM.
+	 */
+	do_prune = prstate.nredirected > 0 ||
+		prstate.ndead > 0 ||
+		prstate.nunused > 0;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -790,182 +1059,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageIsFull(page);
 
 	/*
-	 * Decide if we want to go ahead with freezing according to the freeze
-	 * plans we prepared, or not.
-	 */
-	do_freeze = false;
-	if (prstate.attempt_freeze)
-	{
-		if (prstate.pagefrz.freeze_required)
-		{
-			/*
-			 * heap_prepare_freeze_tuple indicated that at least one XID/MXID
-			 * from before FreezeLimit/MultiXactCutoff is present.  Must
-			 * freeze to advance relfrozenxid/relminmxid.
-			 */
-			do_freeze = true;
-		}
-		else
-		{
-			/*
-			 * Opportunistically freeze the page if we are generating an FPI
-			 * anyway and if doing so means that we can set the page
-			 * all-frozen afterwards (might not happen until VACUUM's final
-			 * heap pass).
-			 *
-			 * XXX: Previously, we knew if pruning emitted an FPI by checking
-			 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze
-			 * and prune records were combined, this heuristic couldn't be
-			 * used anymore.  The opportunistic freeze heuristic must be
-			 * improved; however, for now, try to approximate the old logic.
-			 */
-			if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
-			{
-				/*
-				 * Freezing would make the page all-frozen.  Have already
-				 * emitted an FPI or will do so anyway?
-				 */
-				if (RelationNeedsWAL(relation))
-				{
-					if (did_tuple_hint_fpi)
-						do_freeze = true;
-					else if (do_prune)
-					{
-						if (XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-					else if (do_hint_full_or_prunable)
-					{
-						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
-	}
-	else if (prstate.nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate.pagefrz.freeze_required);
-
-		prstate.all_frozen = false;
-		prstate.nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page. It needs to reflect the present state of
-	 * the page when using it to determine whether or not to update the VM.
-	 *
-	 * Keep track of whether or not the page was all-frozen except LP_DEAD
-	 * items for the purposes of calculating the snapshot conflict horizon,
-	 * though.
+	 * We must decide whether or not to freeze before deciding if and what to
+	 * set in the VM.
 	 */
-	all_frozen_except_lp_dead = prstate.all_frozen;
-	if (prstate.lpdead_items > 0)
-	{
-		prstate.all_visible = false;
-		prstate.all_frozen = false;
-	}
-
-	/*
-	 * If this is an on-access call and we're not actually pruning, avoid
-	 * setting the visibility map if it would newly dirty the heap page or, if
-	 * the page is already dirty, if doing so would require including a
-	 * full-page image (FPI) of the heap page in the WAL. This situation
-	 * should be rare, as on-access pruning is only attempted when
-	 * pd_prune_xid is valid.
-	 */
-	if (reason == PRUNE_ON_ACCESS &&
-		prstate.consider_update_vm &&
-		prstate.all_visible &&
-		!do_prune && !do_freeze &&
-		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
-	{
-		prstate.consider_update_vm = false;
-		prstate.all_visible = prstate.all_frozen = false;
-	}
-
-	Assert(!prstate.all_frozen || prstate.all_visible);
-
-	/*
-	 * Handle setting visibility map bit based on information from the VM (if
-	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
-	 * call), and from all_visible and all_frozen variables.
-	 */
-	if (prstate.consider_update_vm)
-	{
-		/*
-		 * Clear any VM corruption. This does not need to be in a critical
-		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
-		 * we may mark the heap page buffer dirty here and could end up doing
-		 * so again later. This is not a correctness issue and is in the path
-		 * of VM corruption, so we don't have to worry about the extra
-		 * performance overhead.
-		 */
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av, prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
-
-		/* Determine if we actually need to set the VM and which bits to set. */
-		else if (prstate.all_visible &&
-				 (!blk_known_av ||
-				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
-			if (prstate.all_frozen)
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-	}
-
-	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
-
-	/*
-	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
-	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
-	 * set, we strongly prefer to keep them in sync.
-	 *
-	 * Prior to Postgres 19, it was possible for the page-level bit to be set
-	 * and the VM bit to be clear. This could happen if we crashed after
-	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
-	 */
-	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+	do_freeze = heap_page_will_freeze(relation, buffer,
+									  do_prune,
+									  do_hint_full_or_prunable,
+									  did_tuple_hint_fpi,
+									  &prstate,
+									  &all_frozen_except_lp_dead);
+
+	do_set_vm = heap_page_will_update_vm(relation,
+										 buffer, blockno, page,
+										 reason,
+										 do_prune, do_freeze,
+										 blk_known_av,
+										 &prstate,
+										 &vmbuffer,
+										 &vmflags, &set_pd_all_visible);
 
 	/* Save these for the caller in case we later zero out vmflags */
 	presult->new_vmbits = vmflags;
 
-	/* Any error while applying the changes is critical */
+	/*
+	 * Time to actually make the changes to the page and log them. Any error
+	 * while applying the changes is critical.
+	 */
 	START_CRIT_SECTION();
 
 	if (do_hint_full_or_prunable)
-- 
2.43.0



  [text/x-patch] v6-0020-Set-pd_prune_xid-on-insert.patch (4.4K, 21-v6-0020-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 40bcb601af134bfa13af29baecf5d6a6f299e5d7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v6 20/20] Set pd_prune_xid on insert

Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.

For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.
---
 src/backend/access/heap/heapam.c      | 25 +++++++++++++++++--------
 src/backend/access/heap/heapam_xlog.c | 15 ++++++++++++++-
 2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f90b014a9b0..e0f2245052c 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2094,6 +2094,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2153,15 +2154,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2171,7 +2176,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2534,8 +2538,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 64f06d46bf1..234e9a401b9 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -473,6 +473,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -622,9 +628,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-08-26 09:58  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  2 siblings, 1 reply; 143+ messages in thread

From: Kirill Reshke @ 2025-08-26 09:58 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>

On Sat, 2 Aug 2025 at 02:36, Melanie Plageman <[email protected]> wrote:
>
> On Thu, Jul 31, 2025 at 6:58 PM Melanie Plageman
> <[email protected]> wrote:
> >
> > The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
> > patch in the set. It sets pd_prune_xid on insert (so pages filled by
> > COPY or insert can also be set all-visible in the VM before they are
> > vacuumed). I gave it a .txt extension because it currently fails
> > 035_standby_logical_decoding due to a recovery conflict. I need to
> > investigate more to see if this is a bug in my patch set or elsewhere
> > in Postgres.
>
> I figured out that if we set the VM on-access, we need to enable
> hot_standby_feedback in more places in 035_standby_logical_decoding.pl
> to avoid recovery conflicts. I've done that in the attached updated
> version 6. There are a few other issues in
> 035_standby_logical_decoding.pl that I reported here [1]. With these
> changes, setting pd_prune_xid on insert passes tests. Whether or not
> we want to do it (and what the heuristic should be for deciding when
> to do it) is another question.
>
> - Melanie
>
> [1] https://www.postgresql.org/message-id/flat/CAAKRu_YO2mEm%3DZWZKPjTMU%3DgW5Y83_KMi_1cr51JwavH0ctd7w%4...

Hi!

Andrey told me off-list about this thread and I decided to take a look.

I tried to play with each patch in this patchset and find a
corruption, but I was unsuccessful. I will conduct further tests
later. I am not implying that I suspect this patchset causes any
corruption; I am merely attempting to verify it.

I also have few comments and questions. Here is my (very limited)
review of 0001, because I believe that removing xl_heap_visible from
COPY FREEZE is pure win, so this patch can be very beneficial by
itself.

visibilitymap_set_vmbyte is introduced in 0001 and removed in 0012.
This is strange to me, maybe we can avoid visibilitymap_set_vmbyte in
first place?

In 0001:

1)
should we add "Assert(LWLockHeldByMeInMode(BufferDescriptorGetContentLock(bufHdr),
LW_EXCLUSIVE));" in visibilitymap_set_vmbyte?

Also here  `Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer),
vmbuffer));` can be beneficial:

>/*
>+ * If we're only adding already frozen rows to a previously empty
>+ * page, mark it as all-frozen and update the visibility map. We're
>+ * already holding a pin on the vmbuffer.
>+ */
>   else if (all_frozen_set)
>+ {
>    PageSetAllVisible(page);
>+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
>+ visibilitymap_set_vmbyte(relation,
>+ BufferGetBlockNumber(buffer),
>+ vmbuffer,
>+ VISIBILITYMAP_ALL_VISIBLE |
>+ VISIBILITYMAP_ALL_FROZEN);
>+ }

2)
in heap_xlog_multi_insert:

+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ visibilitymap_set_vmbyte(....)

Do we need to pin vmbuffer here? Looks like
XLogReadBufferForRedoExtended already pins vmbuffer. I verified this
with CheckBufferIsPinnedOnce(vmbuffer) just before visibilitymap_pin
and COPY ... WITH (FREEZE true) test.

3)
>+
> +#ifdef TRACE_VISIBILITYMAP
> + elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
> +#endif

I can see this merely copy-pasted from visibilitymap_set, but maybe
display "flags" also?

4) visibilitymap_set receives  XLogRecPtr recptr parameters, which is
set to WAL record lsn during recovery and to InvalidXLogRecPtr
otherwise. visibilitymap_set manages VM page LSN bases on this recptr
value (inside function logic). visibilitymap_set_vmbyte behaves
vise-versa and makes its caller responsible for page LSN management.
Maybe we should keep these two functions akin to each other?


-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-08-26 20:01  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  2 siblings, 1 reply; 143+ messages in thread

From: Kirill Reshke @ 2025-08-26 20:01 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>

On Sat, 2 Aug 2025 at 02:36, Melanie Plageman <[email protected]> wrote:
>
> On Thu, Jul 31, 2025 at 6:58 PM Melanie Plageman
> <[email protected]> wrote:
> >
> > The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
> > patch in the set. It sets pd_prune_xid on insert (so pages filled by
> > COPY or insert can also be set all-visible in the VM before they are
> > vacuumed). I gave it a .txt extension because it currently fails
> > 035_standby_logical_decoding due to a recovery conflict. I need to
> > investigate more to see if this is a bug in my patch set or elsewhere
> > in Postgres.
>
> I figured out that if we set the VM on-access, we need to enable
> hot_standby_feedback in more places in 035_standby_logical_decoding.pl
> to avoid recovery conflicts. I've done that in the attached updated
> version 6. There are a few other issues in
> 035_standby_logical_decoding.pl that I reported here [1]. With these
> changes, setting pd_prune_xid on insert passes tests. Whether or not
> we want to do it (and what the heuristic should be for deciding when
> to do it) is another question.
>
> - Melanie
>
> [1] https://www.postgresql.org/message-id/flat/CAAKRu_YO2mEm%3DZWZKPjTMU%3DgW5Y83_KMi_1cr51JwavH0ctd7w%4...


0002 No comments from me. Looks straightforward.

Few comments on 0003.

1) This patch introduces XLHP_HAS_VMFLAGS. However it lacks some
helpful comments about this new status bit.

a) In heapam_xlog.h, in xl_heap_prune struct definition:


/*
* If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
* unaligned
*/
+ /* If XLHP_HAS_VMFLAGS is set, newly set visibility map bits comes,
unaligned */

b)

we can add here comment why we use  memcpy assignment, akin to /*
memcpy() because snapshot_conflict_horizon is stored unaligned */

+ /* Next are the optionally included vmflags. Copy them out for later use. */
+ if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+ {
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);

2) Should we move conflict_xid = visibility_cutoff_xid; assignment
just after heap_page_is_all_visible_except_lpdead call in
lazy_vacuum_heap_page?

3) Looking at this diff, do not comprehend one bit: how are we
protected from passing an all-visible page to lazy_vacuum_heap_page. I
did not manage to reproduce such behaviour though.

+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ Assert(!PageIsAllVisible(page));
+ set_pd_all_vis = true;
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbyte(vacrel->rel,
+ blkno,
+



-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-08-27 12:55  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  2 siblings, 1 reply; 143+ messages in thread

From: Kirill Reshke @ 2025-08-27 12:55 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>

On Sat, 2 Aug 2025 at 02:36, Melanie Plageman <[email protected]> wrote:
>
> On Thu, Jul 31, 2025 at 6:58 PM Melanie Plageman
> <[email protected]> wrote:
> >
> > The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
> > patch in the set. It sets pd_prune_xid on insert (so pages filled by
> > COPY or insert can also be set all-visible in the VM before they are
> > vacuumed). I gave it a .txt extension because it currently fails
> > 035_standby_logical_decoding due to a recovery conflict. I need to
> > investigate more to see if this is a bug in my patch set or elsewhere
> > in Postgres.
>
> I figured out that if we set the VM on-access, we need to enable
> hot_standby_feedback in more places in 035_standby_logical_decoding.pl
> to avoid recovery conflicts. I've done that in the attached updated
> version 6. There are a few other issues in
> 035_standby_logical_decoding.pl that I reported here [1]. With these
> changes, setting pd_prune_xid on insert passes tests. Whether or not
> we want to do it (and what the heuristic should be for deciding when
> to do it) is another question.
>
> - Melanie
>
> [1] https://www.postgresql.org/message-id/flat/CAAKRu_YO2mEm%3DZWZKPjTMU%3DgW5Y83_KMi_1cr51JwavH0ctd7w%4...

v6-0015:
I chose to verify whether this single modification would be beneficial
on the HEAD.

Benchmark I did:

```

\timing
CREATE TABLE zz(i int);
alter table zz set (autovacuum_enabled = false);
TRUNCATE zz;
copy zz from program 'yes 2 | head -n 180000000';
copy zz from program 'yes 2 | head -n 180000000';

delete from zz where (REPLACE(REPLACE(ctid::text, '(', '{'), ')',
'}')::int[])[2] = 7 ;

VACUUM FREEZE zz;
```

And I checked perf top footprint for last statement (vacuum).  My
detailed results are attached. It is a HEAD vs HEAD+v6-0015 benchmark.

TLDR: function inlining is indeed beneficial, TransactionIdPrecedes
function disappears from perf top footprint, though query runtime is
not changed much. So, while not resulting in query speedup, this can
save CPU.
Maybe we can derive an artificial benchmark, which will show query
speed up, but for now I dont have one.

-- 
Best regards,
Kirill Reshke

without:

Overhead  Shared Ob  Symbol
   7.13%  postgres   [.] heap_page_prune_and_freeze
   6.46%  postgres   [.] heap_prune_record_unchanged_lp_normal
   5.78%  [kernel]   [k] _raw_spin_unlock_irqrestore
   4.51%  postgres   [.] heap_prepare_freeze_tuple
   4.38%  postgres   [.] HeapTupleSatisfiesVacuumHorizon
   4.04%  postgres   [.] heap_page_is_all_visible
   3.79%  postgres   [.] hash_search_with_hash_value
   3.58%  [kernel]   [k] copy_page_from_iter_atomic
   3.51%  [kernel]   [k] mark_buffer_dirty
   3.28%  postgres   [.] pg_checksum_page
   2.08%  postgres   [.] pg_comp_crc32c_avx512
   1.96%  postgres   [.] heap_pre_freeze_checks
   1.90%  postgres   [.] compactify_tuples
   1.87%  postgres   [.] PageRepairFragmentation
   1.82%  postgres   [.] PageSetChecksumCopy
   1.75%  libc.so.6  [.] __memmove_evex_unaligned_erms
   1.72%  postgres   [.] heap_log_freeze_cmp
   1.72%  postgres   [.] log_heap_prune_and_freeze
   1.43%  postgres   [.] LWLockReleaseInternal
   1.36%  postgres   [.] heap_freeze_prepared_tuples
   1.30%  postgres   [.] AdvanceXLInsertBuffer
   1.30%  postgres   [.] TransactionIdPrecedes
   0.96%  postgres   [.] HeapTupleSatisfiesVacuum
   0.91%  [kernel]   [k] filemap_get_entry
   0.87%  [kernel]   [k] ext4_da_write_end
   0.84%  postgres   [.] GetPrivateRefCountEntry
   0.83%  [kernel]   [k] fault_in_readable
   0.81%  postgres   [.] heap_tuple_needs_eventual_freeze
   0.80%  postgres   [.] TransactionIdDidCommit
   0.71%  [kernel]   [k] __block_commit_write
   0.69%  postgres   [.] LWLockAttemptLock
   0.64%  postgres   [.] TransactionLogFetch
   0.62%  postgres   [.] heap_page_prune_execute
   0.62%  [kernel]   [k] refill_stock
   0.57%  postgres   [.] LockBufHdr
   0.53%  [kernel]   [k] lruvec_stat_mod_folio
   0.52%  [kernel]   [k] refill_obj_stock
   0.51%  postgres   [.] TransactionIdFollows



reshke=# \timing
Timing is on.
reshke=# copy zz from program 'yes 2 | head -n 180000000';
COPY 180000000
Time: 58795.832 ms (00:58.796)
reshke=# copy zz from program 'yes 2 | head -n 180000000';

delete from zz where (REPLACE(REPLACE(ctid::text, '(', '{'), ')', '}')::int[])[2] = 7 ;

VACUUM FREEZE zz;
COPY 180000000
Time: 62297.357 ms (01:02.297)
DELETE 1592921
Time: 373495.158 ms (06:13.495)
VACUUM
Time: 199150.554 ms (03:19.151)



with:

7.07%  postgres   [.] heap_prune_record_unchanged_lp_normal
   6.67%  postgres   [.] heap_page_prune_and_freeze
   5.87%  [kernel]   [k] _raw_spin_unlock_irqrestore
   4.48%  postgres   [.] HeapTupleSatisfiesVacuumHorizon
   4.05%  postgres   [.] heap_prepare_freeze_tuple
   4.00%  [kernel]   [k] mark_buffer_dirty
   3.92%  [kernel]   [k] copy_page_from_iter_atomic
   3.59%  postgres   [.] heap_page_is_all_visible
   3.58%  postgres   [.] pg_checksum_page
   3.42%  postgres   [.] hash_search_with_hash_value
   2.26%  postgres   [.] pg_comp_crc32c_avx512
   2.12%  postgres   [.] PageRepairFragmentation
   2.07%  postgres   [.] PageSetChecksumCopy
   1.89%  postgres   [.] heap_pre_freeze_checks
   1.83%  postgres   [.] heap_log_freeze_cmp
   1.65%  [kernel]   [k] filemap_get_entry
   1.64%  postgres   [.] compactify_tuples
   1.53%  postgres   [.] LWLockReleaseInternal
   1.50%  postgres   [.] log_heap_prune_and_freeze
   1.49%  postgres   [.] heap_freeze_prepared_tuples
   1.40%  libc.so.6  [.] __memmove_evex_unaligned_erms
   1.38%  postgres   [.] AdvanceXLInsertBuffer
   1.04%  postgres   [.] HeapTupleSatisfiesVacuum
   1.03%  [kernel]   [k] __block_commit_write
   0.96%  postgres   [.] LWLockAttemptLock
   0.92%  [kernel]   [k] ext4_da_write_end
   0.90%  postgres   [.] heap_tuple_needs_eventual_freeze
   0.80%  postgres   [.] GetPrivateRefCountEntry
   0.78%  [kernel]   [k] fault_in_readable
   0.75%  postgres   [.] TransactionIdDidCommit
   0.72%  [kernel]   [k] lruvec_stat_mod_folio
   0.69%  [kernel]   [k] refill_stock
   0.69%  postgres   [.] TransactionLogFetch
   0.64%  [kernel]   [k] refill_obj_stock
   0.61%  postgres   [.] XLogInsert
   0.56%  postgres   [.] heap_page_prune_execute
   0.56%  [kernel]   [k] _raw_spin_lock
   0.50%  postgres   [.] uint32_hash


reshke=# copy zz from program 'yes 2 | head -n 180000000';
copy zz from program 'yes 2 | head -n 180000000';

delete from zz where (REPLACE(REPLACE(ctid::text, '(', '{'), ')', '}')::int[])[2] = 7 ;
COPY 180000000
Time: 57489.665 ms (00:57.490)
COPY 180000000
Time: 60711.024 ms (01:00.711)
DELETE 1592921
Time: 375481.858 ms (06:15.482)
reshke=#
VACUUM FREEZE zz;
VACUUM
Time: 196442.340 ms (03:16.442)


Attachments:

  [text/plain] attach.txt (4.4K, 2-attach.txt)
  download | inline:
without:

Overhead  Shared Ob  Symbol
   7.13%  postgres   [.] heap_page_prune_and_freeze
   6.46%  postgres   [.] heap_prune_record_unchanged_lp_normal
   5.78%  [kernel]   [k] _raw_spin_unlock_irqrestore
   4.51%  postgres   [.] heap_prepare_freeze_tuple
   4.38%  postgres   [.] HeapTupleSatisfiesVacuumHorizon
   4.04%  postgres   [.] heap_page_is_all_visible
   3.79%  postgres   [.] hash_search_with_hash_value
   3.58%  [kernel]   [k] copy_page_from_iter_atomic
   3.51%  [kernel]   [k] mark_buffer_dirty
   3.28%  postgres   [.] pg_checksum_page
   2.08%  postgres   [.] pg_comp_crc32c_avx512
   1.96%  postgres   [.] heap_pre_freeze_checks
   1.90%  postgres   [.] compactify_tuples
   1.87%  postgres   [.] PageRepairFragmentation
   1.82%  postgres   [.] PageSetChecksumCopy
   1.75%  libc.so.6  [.] __memmove_evex_unaligned_erms
   1.72%  postgres   [.] heap_log_freeze_cmp
   1.72%  postgres   [.] log_heap_prune_and_freeze
   1.43%  postgres   [.] LWLockReleaseInternal
   1.36%  postgres   [.] heap_freeze_prepared_tuples
   1.30%  postgres   [.] AdvanceXLInsertBuffer
   1.30%  postgres   [.] TransactionIdPrecedes
   0.96%  postgres   [.] HeapTupleSatisfiesVacuum
   0.91%  [kernel]   [k] filemap_get_entry
   0.87%  [kernel]   [k] ext4_da_write_end
   0.84%  postgres   [.] GetPrivateRefCountEntry
   0.83%  [kernel]   [k] fault_in_readable
   0.81%  postgres   [.] heap_tuple_needs_eventual_freeze
   0.80%  postgres   [.] TransactionIdDidCommit
   0.71%  [kernel]   [k] __block_commit_write
   0.69%  postgres   [.] LWLockAttemptLock
   0.64%  postgres   [.] TransactionLogFetch
   0.62%  postgres   [.] heap_page_prune_execute
   0.62%  [kernel]   [k] refill_stock
   0.57%  postgres   [.] LockBufHdr
   0.53%  [kernel]   [k] lruvec_stat_mod_folio
   0.52%  [kernel]   [k] refill_obj_stock
   0.51%  postgres   [.] TransactionIdFollows



reshke=# \timing
Timing is on.
reshke=# copy zz from program 'yes 2 | head -n 180000000';
COPY 180000000
Time: 58795.832 ms (00:58.796)
reshke=# copy zz from program 'yes 2 | head -n 180000000';

delete from zz where (REPLACE(REPLACE(ctid::text, '(', '{'), ')', '}')::int[])[2] = 7 ;

VACUUM FREEZE zz;
COPY 180000000
Time: 62297.357 ms (01:02.297)
DELETE 1592921
Time: 373495.158 ms (06:13.495)
VACUUM
Time: 199150.554 ms (03:19.151)



with:

7.07%  postgres   [.] heap_prune_record_unchanged_lp_normal
   6.67%  postgres   [.] heap_page_prune_and_freeze
   5.87%  [kernel]   [k] _raw_spin_unlock_irqrestore
   4.48%  postgres   [.] HeapTupleSatisfiesVacuumHorizon
   4.05%  postgres   [.] heap_prepare_freeze_tuple
   4.00%  [kernel]   [k] mark_buffer_dirty
   3.92%  [kernel]   [k] copy_page_from_iter_atomic
   3.59%  postgres   [.] heap_page_is_all_visible
   3.58%  postgres   [.] pg_checksum_page
   3.42%  postgres   [.] hash_search_with_hash_value
   2.26%  postgres   [.] pg_comp_crc32c_avx512
   2.12%  postgres   [.] PageRepairFragmentation
   2.07%  postgres   [.] PageSetChecksumCopy
   1.89%  postgres   [.] heap_pre_freeze_checks
   1.83%  postgres   [.] heap_log_freeze_cmp
   1.65%  [kernel]   [k] filemap_get_entry
   1.64%  postgres   [.] compactify_tuples
   1.53%  postgres   [.] LWLockReleaseInternal
   1.50%  postgres   [.] log_heap_prune_and_freeze
   1.49%  postgres   [.] heap_freeze_prepared_tuples
   1.40%  libc.so.6  [.] __memmove_evex_unaligned_erms
   1.38%  postgres   [.] AdvanceXLInsertBuffer
   1.04%  postgres   [.] HeapTupleSatisfiesVacuum
   1.03%  [kernel]   [k] __block_commit_write
   0.96%  postgres   [.] LWLockAttemptLock
   0.92%  [kernel]   [k] ext4_da_write_end
   0.90%  postgres   [.] heap_tuple_needs_eventual_freeze
   0.80%  postgres   [.] GetPrivateRefCountEntry
   0.78%  [kernel]   [k] fault_in_readable
   0.75%  postgres   [.] TransactionIdDidCommit
   0.72%  [kernel]   [k] lruvec_stat_mod_folio
   0.69%  [kernel]   [k] refill_stock
   0.69%  postgres   [.] TransactionLogFetch
   0.64%  [kernel]   [k] refill_obj_stock
   0.61%  postgres   [.] XLogInsert
   0.56%  postgres   [.] heap_page_prune_execute
   0.56%  [kernel]   [k] _raw_spin_lock
   0.50%  postgres   [.] uint32_hash


reshke=# copy zz from program 'yes 2 | head -n 180000000';
copy zz from program 'yes 2 | head -n 180000000';

delete from zz where (REPLACE(REPLACE(ctid::text, '(', '{'), ')', '}')::int[])[2] = 7 ;
COPY 180000000
Time: 57489.665 ms (00:57.490)
COPY 180000000
Time: 60711.024 ms (01:00.711)
DELETE 1592921
Time: 375481.858 ms (06:15.482)
reshke=#
VACUUM FREEZE zz;
VACUUM
Time: 196442.340 ms (03:16.442)

^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-08-27 13:08  Melanie Plageman <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-08-27 13:08 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>

Thanks for all the reviews. I'm working on responding to your previous
mails with a new version.

On Wed, Aug 27, 2025 at 8:55 AM Kirill Reshke <[email protected]> wrote:
>
> v6-0015:
> I chose to verify whether this single modification would be beneficial
> on the HEAD.
>
> Benchmark I did:
>
> ```
>
> \timing
> CREATE TABLE zz(i int);
> alter table zz set (autovacuum_enabled = false);
> TRUNCATE zz;
> copy zz from program 'yes 2 | head -n 180000000';
> copy zz from program 'yes 2 | head -n 180000000';
>
> delete from zz where (REPLACE(REPLACE(ctid::text, '(', '{'), ')',
> '}')::int[])[2] = 7 ;
>
> VACUUM FREEZE zz;
> ```
>
> And I checked perf top footprint for last statement (vacuum).  My
> detailed results are attached. It is a HEAD vs HEAD+v6-0015 benchmark.
>
> TLDR: function inlining is indeed beneficial, TransactionIdPrecedes
> function disappears from perf top footprint, though query runtime is
> not changed much. So, while not resulting in query speedup, this can
> save CPU.
> Maybe we can derive an artificial benchmark, which will show query
> speed up, but for now I dont have one.

I'm not surprised that vacuum freeze does not show a speed up from the
function inlining.

This patch was key for avoiding a regression in the most contrived
worst case scenario example of setting the VM on-access. That is, if
you are pruning only a single tuple on the page as part of a SELECT
query that returns no tuples (think SELECT * FROM foo OFFSET N where N
is greater than the number of rows in the table), and I add
determining if the page is all visible, then the overhead of these
extra function calls in heap_prune_record_unchanged_lp_normal() is
noticeable.

We might be able to come up with a similar example in vacuum without
freeze since it will try to determine if the page is all-visible. Your
example is still running on my machine, though, so I haven't verified
this yet :)

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-08-27 19:02  Melanie Plageman <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-08-27 19:02 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>

Thanks for the review! Updates are in attached v7.

One note on 0022 in the set, which sets pd_prune_xid on insert: the
recently added index-killtuples isolation test was failing with this
patch applied. With the patch, the "access" step reports more heap
page hits than before. After some analysis, it seems one of the heap
pages in kill_prior_tuples table is now being pruned in an earlier
step. Somehow this leads to 4 hits counted instead of 3 (even though
there are only 4 blocks in the relation). I recall Bertrand mentioning
something in some other thread about hits being double counted with
AIO reads, so I'm going to try and go dig that up. But, for now, I've
modified the test -- I believe the patch is only revealing an issue
with instrumentation, not causing a bug.

On Tue, Aug 26, 2025 at 5:58 AM Kirill Reshke <[email protected]> wrote:
>
> visibilitymap_set_vmbyte is introduced in 0001 and removed in 0012.
> This is strange to me, maybe we can avoid visibilitymap_set_vmbyte in
> first place?

The reason for this is that in the earlier patch I introduce
visibilitymap_set_vmbyte() for one user while other users still use
visibilitymap_set(). But, by the end of the set, all users use
visibillitymap_set_vmbyte(). So I think it makes most sense for it to
be named visibilitymap_set() at that point. Until all users are
committed, the two functions both have to exist and need different
names.

> In 0001:
> should we add "Assert(LWLockHeldByMeInMode(BufferDescriptorGetContentLock(bufHdr),
> LW_EXCLUSIVE));" in visibilitymap_set_vmbyte?

I don't want any operations on the heap buffer (including asserts) in
visibilitymap_set_vmbyte(). The heap block is only provided to look up
the VM bits.

I think your idea is a good one for the existing visibilitymap_set(),
though, so I've added a new patch to the set (0002) which does this. I
also added a similar assertion for the vmbuffer to
visibilitymap_set_vmbyte().

> Also here  `Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer),
> vmbuffer));` can be beneficial:

I had omitted this because the same logic is checked inside of
visiblitymap_set_vmbyte() with an error occurring if conditions are
not met. However, since the same is true in visibilitymap_set() and
heap_multi_insert() still asserted visiblitymap_pin_ok(), I've added
it back to my patch set.

> in heap_xlog_multi_insert:
> +
> + visibilitymap_pin(reln, blkno, &vmbuffer);
> + visibilitymap_set_vmbyte(....)
>
> Do we need to pin vmbuffer here? Looks like
> XLogReadBufferForRedoExtended already pins vmbuffer. I verified this
> with CheckBufferIsPinnedOnce(vmbuffer) just before visibilitymap_pin
> and COPY ... WITH (FREEZE true) test.

I thought the reason visibilitymap_set() did it was that it was
possible for the block of the VM corresponding to the block of the
heap to be different during recovery than it was when emitting the
record, and thus we needed the part of visiblitymap_pin() that
released the old vmbuffer and got the new one corresponding to the
heap block.

I can't quite think of how this could happen though.

Assuming it can't happen, then we can get rid of visiblitymap_pin()
(and add visibilitymap_pin_ok()) in both visiblitymap_set_vmbyte() and
visibilitymap_set(). I've done this to visibilitymap_set() in a
separate patch 0001. I would like other opinions/confirmation that the
block of the VM corresponding to the heap block cannot differ during
recovery from that what it was when the record was emitted during
normal operation, though.

> > +#ifdef TRACE_VISIBILITYMAP
> > + elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
> > +#endif
>
> I can see this merely copy-pasted from visibilitymap_set, but maybe
> display "flags" also?

Done in attached.

> 4) visibilitymap_set receives  XLogRecPtr recptr parameters, which is
> set to WAL record lsn during recovery and to InvalidXLogRecPtr
> otherwise. visibilitymap_set manages VM page LSN bases on this recptr
> value (inside function logic). visibilitymap_set_vmbyte behaves
> vise-versa and makes its caller responsible for page LSN management.
> Maybe we should keep these two functions akin to each other?

So, with visibilitymap_set_vmbyte(), the whole idea is to just update
the VM and then leave the WAL logging and other changes to the caller
(like marking the buffer dirty, setting the page LSN, etc). The series
of operations needed to make a persistent change are up to the caller.
visibilitymap_set() is meant to just make sure that the correct bits
in the VM are set for the given heap block.

I looked at ways of making the current visibilitymap_set() API cleaner
-- like setting the heap page LSN with the VM recptr in the caller of
visibilitymap_set() instead. There wasn't a way of doing it that
seemed like enough of an improvement to merit the change.

Not to mention, the goal of the patchset is to remove the current
visibilitymap_set(), so I'm not too worried about parity between the
two functions. They may coexist for awhile, but hopefully today's
visibilitymap_set() will eventually be removed.

- Melanie


Attachments:

  [text/x-patch] v7-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch (24.9K, 2-v7-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch)
  download | inline diff:
From b980fc377a66d28acaf12a217e0fcd48a422ca69 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v7 05/22] Eliminate xl_heap_visible from vacuum phase III

Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c  | 143 +++++++++++++++++++++---
 src/backend/access/heap/pruneheap.c    |  48 +++++++-
 src/backend/access/heap/vacuumlazy.c   | 149 +++++++++++++++++--------
 src/backend/access/rmgrdesc/heapdesc.c |  13 ++-
 src/include/access/heapam.h            |   9 ++
 src/include/access/heapam_xlog.h       |   7 +-
 6 files changed, 300 insertions(+), 69 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf437b14eeb..05dce829eae 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. If pruning, that
+	 * means we cannot remove tuples still visible to transactions on the
+	 * standby. If freezing, that means we cannot freeze tuples with xids that
+	 * are still considered running on the standby. And for setting the VM, we
+	 * cannot do so if the page isn't all-visible to all transactions on the
+	 * standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -70,13 +76,29 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 												rlocator);
 	}
 
+	/* Next are the optionally included vmflags. Copy them out for later use. */
+	if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+	{
+		/* memcpy because vmflags is stored unaligned */
+		memcpy(&vmflags, maindataptr, sizeof(uint8));
+		maindataptr += sizeof(uint8);
+
+		/*
+		 * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
+		 * because we already have XLHP_IS_CATALOG_REL.
+		 */
+		Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+		/* Must never set all_frozen bit without also setting all_visible bit */
+		Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
+	}
+
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = (Page) BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -89,6 +111,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Size		datalen;
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
+		bool		do_prune;
+		bool		mark_buffer_dirty;
+		bool		set_heap_lsn;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +122,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+		set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 ||
+			   vmflags & VISIBILITYMAP_VALID_BITS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,26 +170,72 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		Assert(BufferIsValid(buffer) &&
+			   BufferGetBlockNumber(buffer) == blkno);
+
+		/*
+		 * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+		 * also going to set bits in the VM later.
+		 *
+		 * We must never end up with the VM bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+		 * modification would fail to clear the VM bit.
+		 */
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+
+			/*
+			 * If the only change to the heap page is setting PD_ALL_VISIBLE,
+			 * we can avoid setting the page LSN unless checksums or
+			 * wal_log_hints are enabled.
+			 */
+			set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+			mark_buffer_dirty = true;
+		}
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
 
-		PageSetLSN(page, lsn);
-		MarkBufferDirty(buffer);
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+		if (set_heap_lsn)
+			PageSetLSN(page, lsn);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or will be setting a page in
+	 * the visibility map, update the free space map.
+	 *
+	 * Even if we are just updating the VM (and thus not freeing up any
+	 * space), we'll still update the FSM for this page. Since FSM is not
+	 * WAL-logged and only updated heuristically, it easily becomes stale in
+	 * standbys.  If the standby is later promoted and runs VACUUM, it will
+	 * skip updating individual free space figures for pages that became
+	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
+	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+	 * space values to upper FSM layers; later inserters try to use such pages
+	 * only to find out that they are unusable.  This can cause long stalls
+	 * when there are many such pages.
+	 *
+	 * Forestall those problems by updating FSM's idea about a page that is
+	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
+	 *
+	 * We want to avoid holding an exclusive lock on the heap buffer while
+	 * doing IO (either of the FSM or the VM), so we'll release the lock on
+	 * the heap buffer before doing either.
 	 */
 	if (BufferIsValid(buffer))
 	{
-		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
-						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
+		if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+							XLHP_HAS_DEAD_ITEMS |
+							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+			vmflags & VISIBILITYMAP_VALID_BITS)
 		{
 			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
 
@@ -168,6 +246,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		else
 			UnlockReleaseBuffer(buffer);
 	}
+
+	/*
+	 * Read and update the VM block. Even if we skipped updating the heap page
+	 * due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * Note that it is *only* okay that we do not hold a lock on the heap page
+	 * because we are in recovery and can expect no other writers to clear
+	 * PD_ALL_VISIBLE before we are able to update the VM.
+	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		uint8		old_vmbits = 0;
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		visibilitymap_pin(reln, blkno, &vmbuffer);
+		old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a8025889be0..d9ba0f96e34 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -2045,12 +2047,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer,
+						  uint8 vmflags,
+						  bool set_pd_all_vis,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
 						  PruneReason reason,
@@ -2062,6 +2075,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2084,19 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
 	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2136,6 +2165,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Prepare the main xl_heap_prune record.  We already set the XLHP_HAS_*
 	 * flag above.
 	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		xlrec.flags |= XLHP_HAS_VMFLAGS;
 	if (RelationIsAccessibleInLogicalDecoding(relation))
 		xlrec.flags |= XLHP_IS_CATALOG_REL;
 	if (TransactionIdIsValid(conflict_xid))
@@ -2150,6 +2181,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	XLogRegisterData(&xlrec, SizeOfHeapPrune);
 	if (TransactionIdIsValid(conflict_xid))
 		XLogRegisterData(&conflict_xid, sizeof(TransactionId));
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterData(&vmflags, sizeof(uint8));
 
 	switch (reason)
 	{
@@ -2168,5 +2201,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+
+	/*
+	 * If pruning or freezing tuples or setting the page all-visible when
+	 * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+	 * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+	 * but this is deemed okay for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+		PageSetLSN(BufferGetPage(buffer), recptr);
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a62a93eee5..460cdbd8417 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,11 +464,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
+static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+												   TransactionId OldestXmin,
+												   OffsetNumber *deadoffsets,
+												   int allowed_num_offsets,
+												   bool *all_frozen,
+												   TransactionId *visibility_cutoff_xid,
+												   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2847,8 +2849,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
+	bool		set_pd_all_vis = false;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2859,6 +2864,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
+											   vacrel->cutoffs.OldestXmin,
+											   deadoffsets, num_offsets,
+											   &all_frozen, &visibility_cutoff_xid,
+											   &vacrel->offnum))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2878,6 +2897,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
+		Assert(!PageIsAllVisible(page));
+		set_pd_all_vis = true;
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		PageSetAllVisible(page);
+		visibilitymap_set_vmbyte(vacrel->rel,
+								 blkno,
+								 vmbuffer, vmflags);
+		conflict_xid = visibility_cutoff_xid;
+	}
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2887,7 +2918,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  vmbuffer,
+								  vmflags,
+								  set_pd_all_vis,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
@@ -2896,39 +2930,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
-	 */
 	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
-	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
-								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  flags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
@@ -3594,6 +3601,25 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
+/*
+ * Wrapper for heap_page_is_all_visible_except_lpdead() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+												  NULL, 0,
+												  all_frozen,
+												  visibility_cutoff_xid,
+												  logging_offnum);
+}
+
 /*
  * Check if every tuple in the given page is visible to all current and future
  * transactions.
@@ -3607,23 +3633,35 @@ dead_items_cleanup(LVRelState *vacrel)
  * visible tuples. Sets *all_frozen to true if every tuple on this page is
  * frozen.
  *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * deadoffsets are the offsets we know about and are about to set LP_UNUSED.
+ * allowed_num_offsets is the number of those. As long as the LP_DEAD items we
+ * encounter on the page match those exactly, we can set the page all-visible
+ * in the VM.
+ *
+ * Callers looking to verify that the page is all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
+heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+									   TransactionId OldestXmin,
+									   OffsetNumber *deadoffsets,
+									   int allowed_num_offsets,
+									   bool *all_frozen,
+									   TransactionId *visibility_cutoff_xid,
+									   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	OffsetNumber current_dead_offsets[MaxHeapTuplesPerPage];
+	size_t		current_num_offsets = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
@@ -3655,9 +3693,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			current_dead_offsets[current_num_offsets++] = offnum;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
@@ -3724,7 +3761,23 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
-	return all_visible;
+	/* If we already know it's not all-visible, return false */
+	if (!all_visible)
+		return false;
+
+	/* If we weren't allowed any dead offsets, we're done */
+	if (allowed_num_offsets == 0)
+		return current_num_offsets == 0;
+
+	/* If the number of dead offsets has changed, that's wrong */
+	if (current_num_offsets != allowed_num_offsets)
+		return false;
+
+	Assert(deadoffsets);
+
+	/* The dead offsets must be the same dead offsets */
+	return memcmp(current_dead_offsets, deadoffsets,
+				  allowed_num_offsets * sizeof(OffsetNumber)) == 0;
 }
 
 /*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..d6c86ccac20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -266,6 +266,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 {
 	char	   *rec = XLogRecGetData(record);
 	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+	char	   *maindataptr = rec + SizeOfHeapPrune;
 
 	info &= XLOG_HEAP_OPMASK;
 	if (info == XLOG_HEAP2_PRUNE_ON_ACCESS ||
@@ -278,7 +279,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		{
 			TransactionId conflict_xid;
 
-			memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
+			memcpy(&conflict_xid, maindataptr, sizeof(TransactionId));
+			maindataptr += sizeof(TransactionId);
 
 			appendStringInfo(buf, "snapshotConflictHorizon: %u",
 							 conflict_xid);
@@ -287,6 +289,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & XLHP_HAS_VMFLAGS)
+		{
+			uint8		vmflags;
+
+			memcpy(&vmflags, maindataptr, sizeof(uint8));
+			maindataptr += sizeof(uint8);
+			appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+		}
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
 										   Buffer buffer);
 extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer,
+									  uint8 vmflags,
+									  bool vm_modified_heap_page,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
 									  PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..d6a479f6984 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -289,12 +289,17 @@ typedef struct xl_heap_prune
 
 	/*
 	 * If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
-	 * unaligned
+	 * unaligned.
+	 *
+	 * Then, if XLHP_HAS_VMFLAGS is set, the VM flags follow, unaligned.
 	 */
 } xl_heap_prune;
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
 
+/* If the record should update the VM, it contains their new value */
+#define		XLHP_HAS_VMFLAGS			(1 << 0)
+
 /* to handle recovery conflict during logical decoding on standby */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
-- 
2.43.0



  [text/x-patch] v7-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patch (5.3K, 3-v7-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patch)
  download | inline diff:
From a8f1b5ef988235f7d9c5fd24d10a139472df2e31 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v7 04/22] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need two parameters from the
LVRelState, so just pass those in explicitly.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 45 ++++++++++++++++++----------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 14036c27e87..8a62a93eee5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,8 +464,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2010,8 +2013,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2907,8 +2911,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * emitted.
 	 */
 	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
@@ -3592,9 +3596,16 @@ dead_items_cleanup(LVRelState *vacrel)
 
 /*
  * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * Return the visibility_cutoff_xid which is the highest xmin amongst the
+ * visible tuples. Sets *all_frozen to true if every tuple on this page is
+ * frozen.
  *
  * This is a stripped down version of lazy_scan_prune().  If you change
  * anything here, make sure that everything stays in sync.  Note that an
@@ -3602,9 +3613,11 @@ dead_items_cleanup(LVRelState *vacrel)
  * introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3627,7 +3640,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3651,9 +3664,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
 										 buf))
 		{
 			case HEAPTUPLE_LIVE:
@@ -3674,7 +3687,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+											   OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3709,7 +3722,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
-- 
2.43.0



  [text/x-patch] v7-0002-Add-assert-and-log-message-to-visibilitymap_set.patch (1.8K, 4-v7-0002-Add-assert-and-log-message-to-visibilitymap_set.patch)
  download | inline diff:
From 73abe01c6f7c69feca4f1f641c8c64d76cccc340 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 27 Aug 2025 10:07:29 -0400
Subject: [PATCH v7 02/22] Add assert and log message to visibilitymap_set

Add an assert to visibilitymap_set() that the provided heap buffer is
exclusively locked, which is expected.

Also, enhance the debug logging message to specify which VM flags were
set.

Based on a related suggestion by Kirill Reshke on an in-progress
patchset.

Discussion: https://postgr.es/m/CALdSSPhAU56g1gGVT0%2BwG8RrSWE6qW8TOfNJS1HNAWX6wPgbFA%40mail.gmail.com
---
 src/backend/access/heap/visibilitymap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 8f918e00af7..7440a65c404 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -255,7 +255,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	uint8		status;
 
 #ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
+	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+		 flags, RelationGetRelationName(rel), heapBlk);
 #endif
 
 	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
@@ -269,6 +270,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
 		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
 
+	Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
+
 	/* Check that we have the right VM page pinned */
 	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
 		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-- 
2.43.0



  [text/x-patch] v7-0001-Remove-unneeded-VM-pin-from-VM-replay.patch (1.6K, 5-v7-0001-Remove-unneeded-VM-pin-from-VM-replay.patch)
  download | inline diff:
From 449f67324384d01d0e9601362f49bbe5b25f2676 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 27 Aug 2025 08:50:15 -0400
Subject: [PATCH v7 01/22] Remove unneeded VM pin from VM replay

During replay of an operation setting bits in the visibility map,
XLogReadBufferForRedoExtended() will return a pinned buffer containing
the specified block of the visibility map. It will also be sure to
create the visibility map if it doesn't exist. Previously,
heap_xlog_visible() called visibilitymap_pin() even after getting a
buffer in this way. This would just have resulted in visibilitymap_pin()
returning early since the specified page was already present and pinned.
Thus, it wouldn't have resulted in another pin and we can just eliminate
this call to visibilitymap_pin().

Inspired by a related report by Kirill Reshke on an in-progress patch.

Discussion: https://postgr.es/m/CALdSSPhAU56g1gGVT0%2BwG8RrSWE6qW8TOfNJS1HNAWX6wPgbFA%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index eb4bd3d6ae3..e3e021f2bdd 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -295,8 +295,8 @@ heap_xlog_visible(XLogReaderState *record)
 		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		reln = CreateFakeRelcacheEntry(rlocator);
-		visibilitymap_pin(reln, blkno, &vmbuffer);
 
+		Assert(visibilitymap_pin_ok(blkno, vmbuffer));
 		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
 						  xlrec->snapshotConflictHorizon, vmbits);
 
-- 
2.43.0



  [text/x-patch] v7-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch (11.3K, 6-v7-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch)
  download | inline diff:
From 266450693f4df295a257b8316b285b8cfb25761a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v7 03/22] Eliminate xl_heap_visible in COPY FREEZE

Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c        | 48 ++++++++++--------
 src/backend/access/heap/heapam_xlog.c   | 39 +++++++++++++-
 src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c  |  5 ++
 src/include/access/visibilitymap.h      |  2 +
 5 files changed, 138 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7491cc3cb93..4ce0ec61692 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2504,9 +2504,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * If the page is all visible, need to clear that, unless we're only
 		 * going to add further frozen rows to it.
-		 *
-		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2516,8 +2513,23 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+
+		/*
+		 * If we're only adding already frozen rows to a previously empty
+		 * page, mark it as all-frozen and update the visibility map. We're
+		 * already holding a pin on the vmbuffer.
+		 */
 		else if (all_frozen_set)
+		{
+			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
 			PageSetAllVisible(page);
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			visibilitymap_set_vmbyte(relation,
+									 BufferGetBlockNumber(buffer),
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN);
+		}
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2637,29 +2658,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
-		 */
 		if (all_frozen_set)
-		{
-			Assert(PageIsAllVisible(page));
-			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index e3e021f2bdd..cf437b14eeb 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -552,6 +552,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -572,11 +573,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -663,6 +664,42 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Now read and update the VM block. Even if we skipped updating the heap
+	 * page due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * It is only okay to set the VM bits without holding the heap page lock
+	 * because we can expect no other writers of this page.
+	 */
+	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		Assert(visibilitymap_pin_ok(blkno, vmbuffer));
+		visibilitymap_set_vmbyte(reln, blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN);
+
+		/*
+		 * It is not possible that the VM was already set for this heap page,
+		 * so the vmbuffer must have been modified and marked dirty.
+		 */
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7440a65c404..568bc83db9c 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page
+ *		visibilitymap_set	 - set a bit in a previously pinned page and log
+ *      visibilitymap_set_vmbyte - set a bit in a pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+		 flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+	/* Call in same critical section where WAL is emitted. */
+	Assert(InRecovery || CritSectionCount > 0);
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	Assert(BufferIsExclusiveLocked(vmBuf));
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..977566f6b98 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v7-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patch (5.8K, 7-v7-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patch)
  download | inline diff:
From 562ad6de26a83928e6bfd11bdc1dd9db1da601fe Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v7 06/22] Use xl_heap_prune record for setting empty pages
 all-visible

As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
 src/backend/access/heap/pruneheap.c  | 14 +++++--
 src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++----------
 src/include/access/heapam.h          |  1 +
 3 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d9ba0f96e34..97e51f78854 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  false,
 									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
@@ -2051,6 +2052,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
  * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
  * the page LSN when checksums/wal_log_hints are enabled even if we did not
  * prune or freeze tuples on the page.
@@ -2061,6 +2065,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  bool force_heap_fpi,
 						  Buffer vmbuffer,
 						  uint8 vmflags,
 						  bool set_pd_all_vis,
@@ -2089,13 +2094,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlrec.flags = 0;
 	regbuf_flags = REGBUF_STANDARD;
 
+	if (force_heap_fpi)
+		regbuf_flags |= REGBUF_FORCE_IMAGE;
+
 	/*
 	 * We can avoid an FPI if the only modification we are making to the heap
 	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+	else if (!do_prune &&
+			 nfrozen == 0 &&
+			 (!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 460cdbd8417..d9e195269d2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1878,33 +1878,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+				VISIBILITYMAP_ALL_FROZEN;
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
-			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				PageGetLSN(page) == InvalidXLogRecPtr)
-				log_newpage_buffer(buf, true);
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			visibilitymap_set_vmbyte(vacrel->rel, blkno,
+									 vmbuffer, new_vmbits);
+
+			if (RelationNeedsWAL(vacrel->rel))
+			{
+				/*
+				 * It's possible that another backend has extended the heap,
+				 * initialized the page, and then failed to WAL-log the page
+				 * due to an ERROR.  Since heap extension is not WAL-logged,
+				 * recovery might try to replay our record setting the page
+				 * all-visible and find that the page isn't initialized, which
+				 * will cause a PANIC. To prevent that, if the page hasn't
+				 * been previously WAL-logged, force a heap FPI.
+				 */
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  PageGetLSN(page) == InvalidXLogRecPtr,
+										  vmbuffer,
+										  new_vmbits,
+										  true,
+										  InvalidTransactionId,
+										  false, PRUNE_VACUUM_SCAN,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+			}
 
-			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
-			/* Count the newly all-frozen pages for logging */
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+			/* Count the newly all-frozen pages for logging. */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
 		}
@@ -2918,6 +2932,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
+								  false,
 								  vmbuffer,
 								  vmflags,
 								  set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  bool force_heap_fpi,
 									  Buffer vmbuffer,
 									  uint8 vmflags,
 									  bool vm_modified_heap_page,
-- 
2.43.0



  [text/x-patch] v7-0007-Combine-lazy_scan_prune-VM-corruption-cases.patch (7.1K, 8-v7-0007-Combine-lazy_scan_prune-VM-corruption-cases.patch)
  download | inline diff:
From f2365f999a652aabfda0a55761eb3fbb853529ae Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v7 07/22] Combine lazy_scan_prune VM corruption cases

lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.

Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.

This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
 src/backend/access/heap/vacuumlazy.c | 114 +++++++++++++++++----------
 1 file changed, 73 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d9e195269d2..04a7b6c4181 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,6 +431,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1933,6 +1939,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
+
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2079,9 +2145,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * all_frozen variables. Start by looking for any VM corruption.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	{
+		/* Don't update the VM if we just cleared corruption in it */
+	}
+	else if (!all_visible_according_to_vm && presult.all_visible)
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2133,45 +2204,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-- 
2.43.0



  [text/x-patch] v7-0008-Combine-vacuum-phase-I-VM-update-cases.patch (5.8K, 9-v7-0008-Combine-vacuum-phase-I-VM-update-cases.patch)
  download | inline diff:
From 51a5cc0a3334a87a735e0ba5fb20e4bea72aac50 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v7 08/22] Combine vacuum phase I VM update cases

We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.

Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.

The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
 src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
 1 file changed, 32 insertions(+), 69 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 04a7b6c4181..f6cdd9e6828 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2152,11 +2152,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
-	else if (!all_visible_according_to_vm && presult.all_visible)
+
+	/*
+	 * If the page isn't yet marked all-visible in the VM or it is and needs
+	 * to me marked all-frozen, update the VM Note that all_frozen is only
+	 * valid if all_visible is true, so we must check both all_visible and
+	 * all_frozen.
+	 */
+	else if (presult.all_visible &&
+			 (!all_visible_according_to_vm ||
+			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
+		/*
+		 * If the page is all-frozen, we can pass InvalidTransactionId as our
+		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+		 * everything safe for REDO was logged when the page's tuples were
+		 * frozen.
+		 */
 		if (presult.all_frozen)
 		{
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2169,21 +2184,29 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * If the heap page is all-visible but the VM bit is not set, we don't
+		 * need to dirty the heap page.  However, if checksums are enabled, we
+		 * do need to make sure that the heap page is dirtied before passing
+		 * it to visibilitymap_set(), because it may be logged.
 		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
+		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
 									   flags);
 
 		/*
+		 * Even if we are only setting the all-frozen bit, there is a small
+		 * chance that the VM was modified sometime between setting
+		 * all_visible_according_to_vm and checking the visibility during
+		 * pruning. Check the return value of old_vmbits to ensure the
+		 * visibility map counters used for logging are accurate.
+		 *
 		 * If the page wasn't already set all-visible and/or all-frozen in the
 		 * VM, count it as newly set for logging.
 		 */
@@ -2204,66 +2227,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v7-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch (11.7K, 10-v7-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch)
  download | inline diff:
From 37585b0dba82b169bfd6992f513a8a6e791bb4c2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v7 09/22] Find and fix VM corruption in
 heap_page_prune_and_freeze

Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
 src/backend/access/heap/pruneheap.c  | 87 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 77 +++---------------------
 src/include/access/heapam.h          |  4 ++
 3 files changed, 96 insertions(+), 72 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 97e51f78854..496b70e318f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
+
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+			heap_page_prune_and_freeze(relation, buffer, false,
+									   InvalidBuffer,
+									   vistest, 0,
 									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
  * (see heap_prune_satisfies_vacuum).
  *
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   bool blk_known_av,
+						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
 						   int options,
 						   struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	/*
+	 * Clear any VM corruption. This does not need to be done in a critical
+	 * section.
+	 */
+	presult->vm_corruption = false;
+	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+																blockno, buffer, page,
+																blk_known_av,
+																prstate.lpdead_items, vmbuffer);
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f6cdd9e6828..0c121fdf4e6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,12 +431,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
-										   BlockNumber heap_blk,
-										   Buffer heap_buffer, Page heap_page,
-										   bool heap_blk_known_av,
-										   int64 nlpdead_items,
-										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1939,65 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
-							   BlockNumber heap_blk,
-							   Buffer heap_buffer, Page heap_page,
-							   bool heap_blk_known_av,
-							   int64 nlpdead_items,
-							   Buffer vmbuffer)
-{
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
-		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	return false;
-}
-
 
 /* qsort comparator for sorting OffsetNumbers */
 static int
@@ -2056,11 +1991,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+	heap_page_prune_and_freeze(rel, buf,
+							   all_visible_according_to_vm,
+							   vmbuffer,
+							   vacrel->vistest, prune_options,
 							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2145,10 +2083,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables. Start by looking for any VM corruption.
+	 * all_frozen variables.
 	 */
-	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
-									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	if (presult.vm_corruption)
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
+	bool		vm_corruption;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   bool blk_known_av,
+									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
 									   int options,
 									   struct VacuumCutoffs *cutoffs,
-- 
2.43.0



  [text/x-patch] v7-0011-Update-VM-in-pruneheap.c.patch (12.7K, 11-v7-0011-Update-VM-in-pruneheap.c.patch)
  download | inline diff:
From 0c0e63c0d7675559acd1f69203ba7423cd286352 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v7 11/22] Update VM in pruneheap.c

As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
 src/backend/access/heap/pruneheap.c  | 99 +++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 99 +++++-----------------------
 src/include/access/heapam.h          | 15 +++--
 3 files changed, 106 insertions(+), 107 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6c3653e776c..05227ce0339 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -360,7 +360,8 @@ identify_and_fix_vm_corruption(Relation relation,
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -436,6 +437,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint;
+	uint8		vmflags = 0;
+	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -936,7 +939,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 *
 	 * Now that freezing has been finalized, unset all_visible if there are
 	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
+	 * of the page, as expected for updating the visibility map.
 	 */
 	if (prstate.all_visible && prstate.lpdead_items == 0)
 	{
@@ -952,31 +955,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->hastup = prstate.hastup;
 
 	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so the VM update record doesn't need it.
 	 */
 	if (presult->all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
 		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
 
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
 	/*
-	 * Clear any VM corruption. This does not need to be done in a critical
-	 * section.
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
 	 */
-	presult->vm_corruption = false;
 	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
-		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
-																blockno, buffer, page,
-																blk_known_av,
-																prstate.lpdead_items, vmbuffer);
+	{
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av,
+										   prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/*
+		 * If the page isn't yet marked all-visible in the VM or it is and
+		 * needs to me marked all-frozen, update the VM Note that all_frozen
+		 * is only valid if all_visible is true, so we must check both
+		 * all_visible and all_frozen.
+		 */
+		else if (presult->all_visible &&
+				 (!blk_known_av ||
+				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			Assert(prstate.lpdead_items == 0);
+			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			/*
+			 * If the page is all-frozen, we can pass InvalidTransactionId as
+			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+			 * make everything safe for REDO was logged when the page's tuples
+			 * were frozen.
+			 */
+			if (presult->all_frozen)
+			{
+				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			}
+
+			/*
+			 * It's possible for the VM bit to be clear and the page-level bit
+			 * to be set if checksums are not enabled.
+			 *
+			 * And even if we are just planning to update the frozen bit in
+			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
+			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+			 * might have become stale.
+			 *
+			 * If the heap page is all-visible but the VM bit is not set, we
+			 * don't need to dirty the heap page.  However, if checksums are
+			 * enabled, we do need to make sure that the heap page is dirtied
+			 * before passing it to visibilitymap_set(), because it may be
+			 * logged.
+			 */
+			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+			{
+				PageSetAllVisible(page);
+				MarkBufferDirty(buffer);
+			}
+
+			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+										   vmbuffer, presult->vm_conflict_horizon,
+										   vmflags);
+		}
+	}
+
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
+
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = vmflags;
+
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0c121fdf4e6..c49e81bc5dd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1933,7 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -1949,7 +1948,8 @@ cmpOffsetNumbers(const void *a, const void *b)
  * vmbuffer is the buffer containing the VM block with visibility information
  * for the heap block, blkno. all_visible_according_to_vm is the saved
  * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1978,6 +1978,7 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
+	 * Then, if the page's visibility status has changed, update the VM.
 	 *
 	 * If the relation has no indexes, we can immediately mark would-be dead
 	 * items LP_UNUSED.
@@ -1986,10 +1987,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * presult.ndeleted.  It should not be confused with presult.lpdead_items;
 	 * presult.lpdead_items's final value can be thought of as the number of
 	 * tuples that were deleted from indexes.
-	 *
-	 * We will update the VM after collecting LP_DEAD items and freezing
-	 * tuples. Pruning will have determined whether or not the page is
-	 * all-visible.
 	 */
 	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
@@ -2081,88 +2078,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (presult.vm_corruption)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		/* Don't update the VM if we just cleared corruption in it */
-	}
-
-	/*
-	 * If the page isn't yet marked all-visible in the VM or it is and needs
-	 * to me marked all-frozen, update the VM Note that all_frozen is only
-	 * valid if all_visible is true, so we must check both all_visible and
-	 * all_frozen.
-	 */
-	else if (presult.all_visible &&
-			 (!all_visible_according_to_vm ||
-			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		/*
-		 * If the page is all-frozen, we can pass InvalidTransactionId as our
-		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
-		 * everything safe for REDO was logged when the page's tuples were
-		 * frozen.
-		 */
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * If the heap page is all-visible but the VM bit is not set, we don't
-		 * need to dirty the heap page.  However, if checksums are enabled, we
-		 * do need to make sure that the heap page is dirtied before passing
-		 * it to visibilitymap_set(), because it may be logged.
-		 */
-		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * Even if we are only setting the all-frozen bit, there is a small
-		 * chance that the VM was modified sometime between setting
-		 * all_visible_according_to_vm and checking the visibility during
-		 * pruning. Check the return value of old_vmbits to ensure the
-		 * visibility map counters used for logging are accurate.
-		 *
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
+	}
 
 	return presult.ndeleted;
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
+	 * all_visible and all_frozen indicate the status of the page as reflected
+	 * in the visibility map after pruning, freezing, and setting any pages
+	 * all-visible in the visibility map.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * vm_conflict_horizon is the newest xmin of live tuples on the page
+	 * (older than OldestXmin).  It will only be valid if we did not set the
+	 * page all-frozen in the VM.
 	 *
 	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
-	bool		vm_corruption;
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
-- 
2.43.0



  [text/x-patch] v7-0010-Keep-all_frozen-updated-too-in-heap_page_prune_an.patch (3.0K, 12-v7-0010-Keep-all_frozen-updated-too-in-heap_page_prune_an.patch)
  download | inline diff:
From a865234b5efa4a994d5f7887bc8222aa172f4f4d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v7 10/22] Keep all_frozen updated too in
 heap_page_prune_and_freeze

We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
---
 src/backend/access/heap/pruneheap.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 496b70e318f..6c3653e776c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
 	 * whether to freeze the page or not.  The all_visible and all_frozen
 	 * values returned to the caller are adjusted to include LP_DEAD items at
 	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -824,6 +820,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -1468,7 +1465,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1490,7 +1487,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1503,7 +1500,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1522,7 +1519,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1540,7 +1537,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
-- 
2.43.0



  [text/x-patch] v7-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch (28.5K, 13-v7-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch)
  download | inline diff:
From a8977f72c3db92c0585bb906a34cd6e003f8a5e5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v7 12/22] Eliminate xl_heap_visible from vacuum phase I
 prune/freeze

Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.

This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
 src/backend/access/heap/pruneheap.c  | 454 ++++++++++++++++-----------
 src/backend/access/heap/vacuumlazy.c |  30 --
 src/include/access/heapam.h          |  15 +-
 3 files changed, 278 insertions(+), 221 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 05227ce0339..cf9e5215d6b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		freeze;
+
+	/*
+	 * Whether or not to consider updating the VM. There is some bookkeeping
+	 * that must be maintained if we would like to update the VM.
+	 */
+	bool		consider_update_vm;
+
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
 	 *
 	 * These fields are not used by pruning itself for the most part, but are
 	 * used to collect information about what was pruned and what state the
-	 * page is in after pruning, for the benefit of the caller.  They are
-	 * copied to the caller's PruneFreezeResult at the end.
+	 * page is in after pruning to use when updating the visibility map and
+	 * for the benefit of the caller.  They are copied to the caller's
+	 * PruneFreezeResult at the end.
 	 * -------------------------------------------------------
 	 */
 
@@ -138,11 +146,10 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+	 * directly before updating the VM. We ignore LP_DEAD items when deciding
+	 * whether or not to opportunistically freeze and when determining the
+	 * snapshot conflict horizon required when freezing tuples.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -371,12 +378,15 @@ identify_and_fix_vm_corruption(Relation relation,
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
  *
  * blk_known_av is the visibility status of the heap block as of the last call
  * to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -392,6 +402,8 @@ identify_and_fix_vm_corruption(Relation relation,
  *   FREEZE indicates that we will also freeze tuples, and will return
  *   'all_visible', 'all_frozen' flags to the caller.
  *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -436,18 +448,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	HeapTupleData tup;
 	bool		do_freeze;
 	bool		do_prune;
-	bool		do_hint;
+	bool		do_hint_full_or_prunable;
+	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	bool		all_frozen_except_lp_dead = false;
+	bool		set_pd_all_visible = false;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
+	Assert(!prstate.consider_update_vm || vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -492,50 +510,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Keep track of whether or not the page will be all-visible and
+	 * all-frozen for use in opportunistic freezing and to update the VM if
+	 * the caller requests it.
+	 *
+	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
+	 * other callers could do either one. The visibility bookkeeping is
+	 * required for opportunistic freezing (in addition to setting the VM
+	 * bits) because we only consider opportunistically freezing tuples if the
+	 * whole page would become all-frozen or if the whole page will be frozen
+	 * except for dead tuples that will be removed by vacuum.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * If only updating the VM, we must initialize all_frozen to false, as
+	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
+	 * page and we will not end up correctly setting it to false later.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * Dead tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing, so we do not clear
+	 * all_visible when we see LP_DEAD items. We fix that after determining
+	 * whether or not to freeze but before deciding whether or not to update
+	 * the VM so that we don't set the VM bit incorrectly.
+	 *
+	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * bookkeeping. Initializing all_visible to false allows skipping the work
+	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
 	if (prstate.freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if (prstate.consider_update_vm)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon.
+	 * This is most likely to happen when updating the VM and/or freezing all
+	 * live tuples on the page. It is updated before returning to the caller
+	 * because vacuum does assert-build only validation on the page using this
+	 * field.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -733,10 +758,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
+	 * pd_prune_xid field or the page was marked full, we will update those
+	 * hint bits.
 	 */
-	do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_full_or_prunable =
+		((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
 	/*
@@ -784,7 +810,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 						if (XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
 					}
-					else if (do_hint)
+					else if (do_hint_full_or_prunable)
 					{
 						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
@@ -823,11 +849,84 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	all_frozen_except_lp_dead = prstate.all_frozen;
+	if (prstate.lpdead_items > 0)
+	{
+		prstate.all_visible = false;
+		prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
+	 */
+	if (prstate.consider_update_vm)
+	{
+		/*
+		 * Clear any VM corruption. This does not need to be in a critical
+		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+		 * we may mark the heap page buffer dirty here and could end up doing
+		 * so again later. This is not a correctness issue and is in the path
+		 * of VM corruption, so we don't have to worry about the extra
+		 * performance overhead.
+		 */
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av, prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/* Determine if we actually need to set the VM and which bits to set. */
+		else if (prstate.all_visible &&
+				 (!blk_known_av ||
+				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+			if (prstate.all_frozen)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+	}
+
+	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+	/* Save these for the caller in case we later zero out vmflags */
+	presult->new_vmbits = vmflags;
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
-	if (do_hint)
+	if (do_hint_full_or_prunable)
 	{
 		/*
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
@@ -843,15 +942,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageClearFull(page);
 
 		/*
-		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+		 * this is a non-WAL-logged hint.  If we are going to freeze or prune
+		 * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+		 * dirty and emit WAL below.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_prune && !do_freeze && !set_pd_all_visible)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -865,12 +965,47 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (set_pd_all_visible)
+			PageSetAllVisible(page);
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+		 * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+		 * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+		 * appropriate here.
 		 */
-		if (RelationNeedsWAL(relation))
+		if (do_prune || do_freeze || set_pd_all_visible)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
+												  vmbuffer, vmflags);
+
+			if (old_vmbits == vmflags)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				do_set_vm = false;
+				/* 0 out vmflags so we don't emit WAL to update the VM */
+				vmflags = 0;
+			}
+		}
+
+		/*
+		 * It should never be the case that PD_ALL_VISIBLE is not set and the
+		 * VM is set. Or, if it were, we should have caught it earlier when
+		 * finding and fixing VM corruption. So, if we found out the VM was
+		 * already set above, we should have found PD_ALL_VISIBLE set earlier.
+		 */
+		Assert(!set_pd_all_visible || do_set_vm);
+
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+		 * we were only updating the VM and it turns out it was already set,
+		 * we will have unset do_set_vm earlier. As such, check it again
+		 * before emitting the record.
+		 */
+		if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -882,35 +1017,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
-			TransactionId conflict_xid;
+			TransactionId conflict_xid = InvalidTransactionId;
 
 			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
+			 * If we are updating the VM, the conflict horizon is almost
+			 * always the visibility cutoff XID.
+			 *
+			 * Separately, if we are freezing any tuples, as an optimization,
+			 * we can use the visibility_cutoff_xid as the conflict horizon if
+			 * the page will be all-frozen. This is true even if there are
+			 * LP_DEAD line pointers because we ignored those when maintaining
+			 * the visibility_cutoff_xid.
 			 */
-			if (do_freeze)
+			if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+				conflict_xid = prstate.visibility_cutoff_xid;
+
+			/*
+			 * Otherwise, if we are freezing but the page would not be
+			 * all-frozen, we have to use the more pessimistic horizon of
+			 * OldestXmin, which may be newer than the newest tuple we froze.
+			 * We currently don't track the newest tuple we froze.
+			 */
+			else if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
+				conflict_xid = prstate.cutoffs->OldestXmin;
+				TransactionIdRetreat(conflict_xid);
 			}
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
+			/*
+			 * If we are removing tuples with a younger xmax than our so far
+			 * calculated conflict_xid, we must use this as our horizon.
+			 */
+			if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
 				conflict_xid = prstate.latest_xid_removed;
 
+			/*
+			 * We can omit the snapshot conflict horizon if we are not pruning
+			 * or freezing any tuples and are setting an already all-visible
+			 * page all-frozen in the VM. In this case, all of the tuples on
+			 * the page must already be visible to all MVCC snapshots on the
+			 * standby.
+			 */
+			if (!do_prune && !do_freeze && do_set_vm &&
+				blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+				conflict_xid = InvalidTransactionId;
+
 			log_heap_prune_and_freeze(relation, buffer,
 									  false,
-									  InvalidBuffer, 0, false,
+									  vmbuffer,
+									  vmflags,
+									  set_pd_all_visible,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -922,124 +1078,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected for updating the visibility map.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
-	presult->hastup = prstate.hastup;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so the VM update record doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * VACUUM will call heap_page_is_all_visible() during the second pass over
+	 * the heap to determine all_visible and all_frozen for the page -- this
+	 * is a specialized version of the logic from this function.  Now that
+	 * we've finished pruning and freezing, make sure that we're in total
+	 * agreement with heap_page_is_all_visible() using an assertion. We will
+	 * have already set the page in the VM, so this assertion will only let
+	 * you know that you've already done something wrong.
 	 */
-	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
 	{
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av,
-										   prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		/*
-		 * If the page isn't yet marked all-visible in the VM or it is and
-		 * needs to me marked all-frozen, update the VM Note that all_frozen
-		 * is only valid if all_visible is true, so we must check both
-		 * all_visible and all_frozen.
-		 */
-		else if (presult->all_visible &&
-				 (!blk_known_av ||
-				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			Assert(prstate.lpdead_items == 0);
-			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		Assert(cutoffs);
 
-			/*
-			 * If the page is all-frozen, we can pass InvalidTransactionId as
-			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
-			 * make everything safe for REDO was logged when the page's tuples
-			 * were frozen.
-			 */
-			if (presult->all_frozen)
-			{
-				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			}
+		Assert(prstate.lpdead_items == 0);
 
-			/*
-			 * It's possible for the VM bit to be clear and the page-level bit
-			 * to be set if checksums are not enabled.
-			 *
-			 * And even if we are just planning to update the frozen bit in
-			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
-			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
-			 * might have become stale.
-			 *
-			 * If the heap page is all-visible but the VM bit is not set, we
-			 * don't need to dirty the heap page.  However, if checksums are
-			 * enabled, we do need to make sure that the heap page is dirtied
-			 * before passing it to visibilitymap_set(), because it may be
-			 * logged.
-			 */
-			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buffer);
-			}
+		if (!heap_page_is_all_visible(relation, buffer,
+									  cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
 
-			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
-										   vmbuffer, presult->vm_conflict_horizon,
-										   vmflags);
-		}
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
+#endif
 
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->old_vmbits = old_vmbits;
+	/* new_vmbits was set above */
+	presult->hastup = prstate.hastup;
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = vmflags;
-
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
@@ -1621,7 +1708,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			break;
 	}
 
-	/* Consider freezing any normal tuples which will not be removed */
+	/*
+	 * Consider freezing any normal tuples which will not be removed.
+	 * Regardless of whether or not we want to freeze the tuples, if we want
+	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+	 * tuple to know whether or not the page will be totally frozen.
+	 */
 	if (prstate->freeze)
 	{
 		bool		totally_frozen;
@@ -2234,6 +2326,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
 	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
+	Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
 	xlrec.flags = 0;
 	regbuf_flags = REGBUF_STANDARD;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c49e81bc5dd..91e209901b8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2014,34 +2014,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2075,8 +2047,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate the status of the page as reflected
-	 * in the visibility map after pruning, freezing, and setting any pages
-	 * all-visible in the visibility map.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page
-	 * (older than OldestXmin).  It will only be valid if we did not set the
-	 * page all-frozen in the VM.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
 	uint8		old_vmbits;
 	uint8		new_vmbits;
 
-- 
2.43.0



  [text/x-patch] v7-0014-Remove-xl_heap_visible-entirely.patch (24.4K, 14-v7-0014-Remove-xl_heap_visible-entirely.patch)
  download | inline diff:
From 3acf451bb40394330281ae82c5ef4c5c685438b4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v7 14/22] Remove xl_heap_visible entirely

There are now no users of this, so eliminate it entirely.
---
 src/backend/access/common/bufmask.c      |   3 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 154 +----------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  10 +-
 src/backend/access/heap/visibilitymap.c  | 109 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  11 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 30 insertions(+), 368 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4ce0ec61692..060a166e18f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
 #include "access/valid.h"
 #include "access/visibilitymap.h"
 #include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
@@ -2524,11 +2525,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
 			PageSetAllVisible(page);
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			visibilitymap_set_vmbyte(relation,
-									 BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set(relation,
+							  BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN);
 		}
 
 		/*
@@ -8796,49 +8797,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
 
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 05dce829eae..539d38194f5 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -83,10 +83,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		memcpy(&vmflags, maindataptr, sizeof(uint8));
 		maindataptr += sizeof(uint8);
 
-		/*
-		 * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
-		 * because we already have XLHP_IS_CATALOG_REL.
-		 */
 		Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
 		/* Must never set all_frozen bit without also setting all_visible bit */
 		Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
@@ -268,7 +264,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+		old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
 			PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -279,143 +275,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		UnlockReleaseBuffer(vmbuffer);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		Assert(visibilitymap_pin_ok(blkno, vmbuffer));
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -792,16 +651,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
 		Assert(visibilitymap_pin_ok(blkno, vmbuffer));
-		visibilitymap_set_vmbyte(reln, blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN);
 
 		/*
 		 * It is not possible that the VM was already set for this heap page,
 		 * so the vmbuffer must have been modified and marked dirty.
 		 */
 		Assert(BufferIsDirty(vmbuffer));
+		visibilitymap_set(reln, blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
 		PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		FreeFakeRelcacheEntry(reln);
 	}
@@ -1381,9 +1240,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 82127e8728b..ffc12314b41 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -979,8 +979,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_set_vm)
 		{
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
-												  vmbuffer, vmflags);
+			old_vmbits = visibilitymap_set(relation, blockno,
+										   vmbuffer, vmflags);
 
 			if (old_vmbits == vmflags)
 			{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 91e209901b8..6a0fa371a06 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1887,8 +1887,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 			MarkBufferDirty(buf);
 
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			visibilitymap_set_vmbyte(vacrel->rel, blkno,
-									 vmbuffer, new_vmbits);
+			visibilitymap_set(vacrel->rel, blkno,
+							  vmbuffer, new_vmbits);
 
 			if (RelationNeedsWAL(vacrel->rel))
 			{
@@ -2754,9 +2754,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		set_pd_all_vis = true;
 		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbyte(vacrel->rel,
-								 blkno,
-								 vmbuffer, vmflags);
+		visibilitymap_set(vacrel->rel,
+						  blkno,
+						  vmbuffer, vmflags);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 568bc83db9c..8342ec1ff22 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page and log
- *      visibilitymap_set_vmbyte - set a bit in a pinned page
+ *		visibilitymap_set	 - set a bit in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set flags in the VM block contained in the passed in vmBuf.
  *
@@ -343,8 +240,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * is pinned and exclusive locked.
  */
 uint8
-visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index d6c86ccac20..f7880a4ed81 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -351,13 +351,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -462,9 +455,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d6a479f6984..34988d564fd 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -440,20 +439,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -497,11 +482,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 977566f6b98..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..b4c880c083f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4273,7 +4273,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v7-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patch (7.1K, 15-v7-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patch)
  download | inline diff:
From 03c8a4257e02d217378080d611f4c066ad69c496 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v7 15/22] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()

Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 14 +++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 13 ++++++-------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ffc12314b41..715dfc16ba7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -574,9 +574,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1172,11 +1172,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2678f7ab782..4b8e5747239 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index bf987aed8d3..508bb379f87 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisible(state, fxid);
 }
 
 /*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisible(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..547c71fcbfe 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v7-0013-Rename-PruneState.freeze-to-attempt_freeze.patch (4.1K, 16-v7-0013-Rename-PruneState.freeze-to-attempt_freeze.patch)
  download | inline diff:
From 7f96c6f7acd4e08ba0463c5b59394bb79a80005f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v7 13/22] Rename PruneState.freeze to attempt_freeze

This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.

Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
 src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index cf9e5215d6b..82127e8728b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
 	/* whether or not dead items can be set LP_UNUSED during pruning */
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
-	bool		freeze;
+	bool		attempt_freeze;
 
 	/*
 	 * Whether or not to consider updating the VM. There is some bookkeeping
@@ -452,7 +452,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
-	bool		hint_bit_fpi;
+	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	bool		all_frozen_except_lp_dead = false;
 	bool		set_pd_all_visible = false;
@@ -460,7 +460,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
 	prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
@@ -485,7 +485,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/* initialize page freezing working state */
 	prstate.pagefrz.freeze_required = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
 		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -535,7 +535,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * bookkeeping. Initializing all_visible to false allows skipping the work
 	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -653,7 +653,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
 	 * an FPI to be emitted.
 	 */
-	hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	/*
 	 * Process HOT chains.
@@ -770,7 +770,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * plans we prepared, or not.
 	 */
 	do_freeze = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (prstate.pagefrz.freeze_required)
 		{
@@ -803,7 +803,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				 */
 				if (RelationNeedsWAL(relation))
 				{
-					if (hint_bit_fpi)
+					if (did_tuple_hint_fpi)
 						do_freeze = true;
 					else if (do_prune)
 					{
@@ -1127,7 +1127,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
 		{
@@ -1714,7 +1714,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
 	 * tuple to know whether or not the page will be totally frozen.
 	 */
-	if (prstate->freeze)
+	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
 
-- 
2.43.0



  [text/x-patch] v7-0016-Use-GlobalVisState-to-determine-page-level-visibi.patch (10.5K, 17-v7-0016-Use-GlobalVisState-to-determine-page-level-visibi.patch)
  download | inline diff:
From f3c3a44bb4dd5bf311a3a39876e1d26790321c2f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v7 16/22] Use GlobalVisState to determine page level
 visibility

During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.

It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.

Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.

Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
 src/backend/access/heap/pruneheap.c         | 48 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 17 ++++----
 src/include/access/heapam.h                 |  4 +-
 4 files changed, 59 insertions(+), 38 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 715dfc16ba7..ab79d8a3ed9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon, when setting the VM or when
+	 * freezing all the live tuples on the page.
 	 *
 	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
 	 * directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -553,14 +552,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon.
-	 * This is most likely to happen when updating the VM and/or freezing all
-	 * live tuples on the page. It is updated before returning to the caller
-	 * because vacuum does assert-build only validation on the page using this
-	 * field.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState.
+	 *
+	 * If we encounter an uncommitted tuple, this field is unmaintained. If
+	 * the page is being set all-visible or when freezing all live tuples on
+	 * the page, it is used to calculate the snapshot conflict horizon.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -756,6 +753,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -1098,12 +1105,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(cutoffs);
-
 		Assert(prstate.lpdead_items == 0);
 
 		if (!heap_page_is_all_visible(relation, buffer,
-									  cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1628,19 +1633,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6a0fa371a06..777ec30eb82 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,7 +465,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
-												   TransactionId OldestXmin,
+												   GlobalVisState *vistest,
 												   OffsetNumber *deadoffsets,
 												   int allowed_num_offsets,
 												   bool *all_frozen,
@@ -2716,7 +2716,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 InvalidOffsetNumber);
 
 	if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
-											   vacrel->cutoffs.OldestXmin,
+											   vacrel->vistest,
 											   deadoffsets, num_offsets,
 											   &all_frozen, &visibility_cutoff_xid,
 											   &vacrel->offnum))
@@ -3459,13 +3459,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+	return heap_page_is_all_visible_except_lpdead(rel, buf, vistest,
 												  NULL, 0,
 												  all_frozen,
 												  visibility_cutoff_xid,
@@ -3500,7 +3500,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
-									   TransactionId OldestXmin,
+									   GlobalVisState *vistest,
 									   OffsetNumber *deadoffsets,
 									   int allowed_num_offsets,
 									   bool *all_frozen,
@@ -3555,8 +3555,8 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
 		tuple.t_len = ItemIdGetLength(itemid);
 		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
-										 buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+												  buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3575,8 +3575,7 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin,
-											   OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
 
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v7-0017-Inline-TransactionIdFollows-Precedes.patch (4.9K, 18-v7-0017-Inline-TransactionIdFollows-Precedes.patch)
  download | inline diff:
From 364e7b71bbbe0b2e5956acbc33bea4fe8d1b3979 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v7 17/22] Inline TransactionIdFollows/Precedes()

Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
---
 src/backend/access/transam/transam.c | 64 -------------------------
 src/include/access/transam.h         | 70 ++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
-	/*
-	 * If either ID is a permanent XID then we can just do unsigned
-	 * comparison.  If both are normal, do a modulo-2^32 comparison.
-	 */
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 < id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 <= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 > id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 >= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff >= 0);
-}
-
 
 /*
  * TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
 } TransamVariablesData;
 
 
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+	/*
+	 * If either ID is a permanent XID then we can just do unsigned
+	 * comparison.  If both are normal, do a modulo-2^32 comparison.
+	 */
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 < id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 <= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 > id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 >= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff >= 0);
+}
+
+
 /* ----------------
  *		extern declarations
  * ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
 extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
-- 
2.43.0



  [text/x-patch] v7-0019-Allow-on-access-pruning-to-set-pages-all-visible.patch (26.8K, 19-v7-0019-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 55494f3135534695953ce03183f56ff331b3e26e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v7 19/22] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c              | 15 ++++-
 src/backend/access/heap/heapam_handler.c      | 15 ++++-
 src/backend/access/heap/pruneheap.c           | 63 ++++++++++++++-----
 src/backend/access/index/indexam.c            | 46 ++++++++++++++
 src/backend/access/table/tableam.c            | 39 ++++++++++--
 src/backend/executor/execMain.c               |  4 ++
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  7 ++-
 src/backend/executor/nodeIndexscan.c          | 18 ++++--
 src/backend/executor/nodeSeqscan.c            | 24 +++++--
 src/include/access/genam.h                    | 11 ++++
 src/include/access/heapam.h                   | 24 ++++++-
 src/include/access/relscan.h                  |  6 ++
 src/include/access/tableam.h                  | 30 ++++++++-
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  4 +-
 16 files changed, 276 insertions(+), 38 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 060a166e18f..8a8b63b79f2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1247,6 +1250,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1285,6 +1289,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1317,6 +1327,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cb4bc35c93e..c68283de6f2 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 80d055e5376..dad341cb265 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			int			options = 0;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				options = HEAP_PAGE_PRUNE_UPDATE_VM;
+			}
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * that during on-access pruning with the current implementation.
 			 */
 			heap_page_prune_and_freeze(relation, buffer, false,
-									   InvalidBuffer,
-									   vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+									   vmbuffer ? *vmbuffer : InvalidBuffer,
+									   vistest, options,
+									   NULL, &presult, PRUNE_ON_ACCESS,
+									   &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -513,12 +525,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * all-frozen for use in opportunistic freezing and to update the VM if
 	 * the caller requests it.
 	 *
-	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
-	 * other callers could do either one. The visibility bookkeeping is
-	 * required for opportunistic freezing (in addition to setting the VM
-	 * bits) because we only consider opportunistically freezing tuples if the
-	 * whole page would become all-frozen or if the whole page will be frozen
-	 * except for dead tuples that will be removed by vacuum.
+	 * Currently, only VACUUM attempts freezing. But other callers could. The
+	 * visibility bookkeeping is required for opportunistic freezing (in
+	 * addition to setting the VM bits) because we only consider
+	 * opportunistically freezing tuples if the whole page would become
+	 * all-frozen or if the whole page will be frozen except for dead tuples
+	 * that will be removed by vacuum. But if consider_update_vm is false,
+	 * we'll not set the VM even if the page is discovered to be all-visible.
+	 *
+	 * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+	 * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+	 * because we will not call heap_prepare_freeze_tuple() on each tuple.
 	 *
 	 * If only updating the VM, we must initialize all_frozen to false, as
 	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -530,7 +547,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * whether or not to freeze but before deciding whether or not to update
 	 * the VM so that we don't set the VM bit incorrectly.
 	 *
-	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * If not freezing and not updating the VM, we avoid the extra
 	 * bookkeeping. Initializing all_visible to false allows skipping the work
 	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
@@ -879,12 +896,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.all_frozen = false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate.consider_update_vm &&
+		prstate.all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate.consider_update_vm = false;
+		prstate.all_visible = prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * Handle setting visibility map bit based on information from the VM (if
+	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+	 * call), and from all_visible and all_frozen variables.
 	 */
 	if (prstate.consider_update_vm)
 	{
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b8b9d2a85f7..a862701edbe 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..0e986d8ef72 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -876,6 +878,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -913,10 +934,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, struct ScanKeyData *key)
+				   int nkeys, struct ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1125,6 +1149,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index de782014b2d..839c1be1d7c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -678,6 +678,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..870f03bdd79 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -10,6 +10,7 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Time::HiRes qw(usleep);
 use Test::More;
+use Time::HiRes qw(usleep);
 
 if ($ENV{enable_injection_points} ne 'yes')
 {
@@ -296,6 +297,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -745,7 +747,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v7-0018-Unset-all-visible-sooner-if-not-freezing.patch (2.5K, 20-v7-0018-Unset-all-visible-sooner-if-not-freezing.patch)
  download | inline diff:
From e8962ba850206bc7de6f04ba2655c336e8108023 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v7 18/22] Unset all-visible sooner if not freezing

In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.

Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.

Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
 src/backend/access/heap/pruneheap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ab79d8a3ed9..80d055e5376 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1493,8 +1493,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1752,8 +1755,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v7-0021-Reorder-heap_page_prune_and_freeze-parameters.patch (5.8K, 21-v7-0021-Reorder-heap_page_prune_and_freeze-parameters.patch)
  download | inline diff:
From d48a5ea23fe37659673658378d8c47bece8ca282 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v7 21/22] Reorder heap_page_prune_and_freeze parameters

Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
 src/backend/access/heap/pruneheap.c  | 38 ++++++++++++++--------------
 src/backend/access/heap/vacuumlazy.c |  6 ++---
 src/include/access/heapam.h          |  4 +--
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5d943b0c64f..20f4a62fb16 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, false,
+			heap_page_prune_and_freeze(relation, buffer, options, false,
 									   vmbuffer ? *vmbuffer : InvalidBuffer,
-									   vistest, options,
-									   NULL, &presult, PRUNE_ON_ACCESS,
+									   vistest,
+									   NULL, PRUNE_ON_ACCESS, &presult,
 									   &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -645,6 +645,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
+ * options:
+ *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ *   pruning.
+ *
+ *   FREEZE indicates that we will also freeze tuples, and will return
+ *   'all_visible', 'all_frozen' flags to the caller.
+ *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
@@ -663,30 +672,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * contain the required block of the visibility map.
  *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- *   UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
  *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed.  It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -699,13 +699,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   int options,
 						   bool blk_known_av,
 						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
-						   int options,
 						   struct VacuumCutoffs *cutoffs,
-						   PruneFreezeResult *presult,
 						   PruneReason reason,
+						   PruneFreezeResult *presult,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 777ec30eb82..120782fd8ec 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1992,11 +1992,11 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf,
+	heap_page_prune_and_freeze(rel, buf, prune_options,
 							   all_visible_according_to_vm,
 							   vmbuffer,
-							   vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+							   vacrel->vistest,
+							   &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
 								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   int options,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
-									   int options,
 									   struct VacuumCutoffs *cutoffs,
-									   PruneFreezeResult *presult,
 									   PruneReason reason,
+									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
-- 
2.43.0



  [text/x-patch] v7-0022-Set-pd_prune_xid-on-insert.patch (6.5K, 22-v7-0022-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From e46711ca4bb2e484693efa1d2dc8a8f444bfd094 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v7 22/22] Set pd_prune_xid on insert

Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.

For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.

Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8a8b63b79f2..b44176e3c70 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2164,15 +2165,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2182,7 +2187,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2546,8 +2550,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 539d38194f5..5ef49e19c7b 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -474,6 +474,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -623,9 +629,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



  [text/x-patch] v7-0020-Add-helper-functions-to-heap_page_prune_and_freez.patch (18.9K, 23-v7-0020-Add-helper-functions-to-heap_page_prune_and_freez.patch)
  download | inline diff:
From 630aea6e2fc3013c20bc938a0f7a115123da230d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v7 20/22] Add helper functions to heap_page_prune_and_freeze

heap_page_prune_and_freeze() has gotten rather long. It has several
stages:

1) setup - where the PruneStateis set up
2) tuple examination -- where tuples and line pointers are examined to
   determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
   heuristics, and state gathered during stage 2 whether or not to
   freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged

This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.

For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
 src/backend/access/heap/pruneheap.c | 471 +++++++++++++++++-----------
 1 file changed, 295 insertions(+), 176 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dad341cb265..5d943b0c64f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+								  bool do_prune,
+								  bool do_hint_full_or_prunable,
+								  bool did_tuple_hint_fpi,
+								  PruneState *prstate,
+								  bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+									 Buffer buffer, BlockNumber blockno, Page page,
+									 PruneReason reason,
+									 bool do_prune, bool do_freeze,
+									 bool blk_known_av,
+									 PruneState *prstate,
+									 Buffer *vmbuffer, uint8 *vmflags,
+									 bool *set_pd_all_visible);
+
 static bool identify_and_fix_vm_corruption(Relation relation,
 										   BlockNumber heap_blk,
 										   Buffer heap_buffer, Page heap_page,
@@ -376,6 +392,249 @@ identify_and_fix_vm_corruption(Relation relation,
 	return false;
 }
 
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+						 Buffer buffer, BlockNumber blockno, Page page,
+						 PruneReason reason,
+						 bool do_prune, bool do_freeze,
+						 bool blk_known_av,
+						 PruneState *prstate,
+						 Buffer *vmbuffer, uint8 *vmflags,
+						 bool *set_pd_all_visible)
+{
+	bool		do_set_vm = false;
+
+	/*
+	 * If the caller specified not to update the VM, validate everything is in
+	 * the right state and exit.
+	 */
+	if (!prstate->consider_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		/* We don't set only the page level visibility hint */
+		Assert(!(*set_pd_all_visible));
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->consider_update_vm &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate->consider_update_vm = false;
+		prstate->all_visible = prstate->all_frozen = false;
+	}
+
+	Assert(!prstate->all_frozen || prstate->all_visible);
+
+	/*
+	 * Clear any VM corruption. This does not need to be in a critical
+	 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+	 * may mark the heap page buffer dirty here and could end up doing so
+	 * again later. This is not a correctness issue and is in the path of VM
+	 * corruption, so we don't have to worry about the extra performance
+	 * overhead.
+	 */
+	if (identify_and_fix_vm_corruption(relation,
+									   blockno, buffer, page,
+									   blk_known_av, prstate->lpdead_items,
+									   *vmbuffer))
+	{
+		/* If we fix corruption, don't update the VM further */
+	}
+
+	/* Determine if we actually need to set the VM and which bits to set. */
+	else if (prstate->all_visible &&
+			 (!blk_known_av ||
+			  (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+	{
+		*vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+	}
+
+	do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	*set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+	return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool do_prune,
+					  bool do_hint_full_or_prunable,
+					  bool did_tuple_hint_fpi,
+					  PruneState *prstate,
+					  bool *all_frozen_except_lp_dead)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and exit.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		Assert(!(*all_frozen_except_lp_dead));
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_full_or_prunable)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	*all_frozen_except_lp_dead = prstate->all_frozen;
+	if (prstate->lpdead_items > 0)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
+
+	return do_freeze;
+}
+
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page. If the page's visibility status has changed, update it in
@@ -766,20 +1025,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Clear the offset information once we have processed the given page. */
 	*off_loc = InvalidOffsetNumber;
 
-	do_prune = prstate.nredirected > 0 ||
-		prstate.ndead > 0 ||
-		prstate.nunused > 0;
-
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
-	 * all-visible.
+	 * all-visible. This must be done before we decide whether or not to
+	 * opportunistically freeze below because we do not want to
+	 * opportunistically freeze the page if there are live tuples not visible
+	 * to everyone, which would prevent setting the page frozen in the VM.
 	 */
 	if (prstate.all_visible &&
 		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
+	/*
+	 * Now decide based on information collected while examining every tuple
+	 * which actions to take. If there are any prunable tuples, we'll prune
+	 * them. However, we will decide based on options specified by the caller
+	 * and various heuristics whether or not to freeze any tuples and whether
+	 * or not the page should be set all-visible/all-frozen in the VM.
+	 */
+	do_prune = prstate.nredirected > 0 ||
+		prstate.ndead > 0 ||
+		prstate.nunused > 0;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -790,182 +1059,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageIsFull(page);
 
 	/*
-	 * Decide if we want to go ahead with freezing according to the freeze
-	 * plans we prepared, or not.
-	 */
-	do_freeze = false;
-	if (prstate.attempt_freeze)
-	{
-		if (prstate.pagefrz.freeze_required)
-		{
-			/*
-			 * heap_prepare_freeze_tuple indicated that at least one XID/MXID
-			 * from before FreezeLimit/MultiXactCutoff is present.  Must
-			 * freeze to advance relfrozenxid/relminmxid.
-			 */
-			do_freeze = true;
-		}
-		else
-		{
-			/*
-			 * Opportunistically freeze the page if we are generating an FPI
-			 * anyway and if doing so means that we can set the page
-			 * all-frozen afterwards (might not happen until VACUUM's final
-			 * heap pass).
-			 *
-			 * XXX: Previously, we knew if pruning emitted an FPI by checking
-			 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze
-			 * and prune records were combined, this heuristic couldn't be
-			 * used anymore.  The opportunistic freeze heuristic must be
-			 * improved; however, for now, try to approximate the old logic.
-			 */
-			if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
-			{
-				/*
-				 * Freezing would make the page all-frozen.  Have already
-				 * emitted an FPI or will do so anyway?
-				 */
-				if (RelationNeedsWAL(relation))
-				{
-					if (did_tuple_hint_fpi)
-						do_freeze = true;
-					else if (do_prune)
-					{
-						if (XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-					else if (do_hint_full_or_prunable)
-					{
-						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
-	}
-	else if (prstate.nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate.pagefrz.freeze_required);
-
-		prstate.all_frozen = false;
-		prstate.nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page. It needs to reflect the present state of
-	 * the page when using it to determine whether or not to update the VM.
-	 *
-	 * Keep track of whether or not the page was all-frozen except LP_DEAD
-	 * items for the purposes of calculating the snapshot conflict horizon,
-	 * though.
+	 * We must decide whether or not to freeze before deciding if and what to
+	 * set in the VM.
 	 */
-	all_frozen_except_lp_dead = prstate.all_frozen;
-	if (prstate.lpdead_items > 0)
-	{
-		prstate.all_visible = false;
-		prstate.all_frozen = false;
-	}
-
-	/*
-	 * If this is an on-access call and we're not actually pruning, avoid
-	 * setting the visibility map if it would newly dirty the heap page or, if
-	 * the page is already dirty, if doing so would require including a
-	 * full-page image (FPI) of the heap page in the WAL. This situation
-	 * should be rare, as on-access pruning is only attempted when
-	 * pd_prune_xid is valid.
-	 */
-	if (reason == PRUNE_ON_ACCESS &&
-		prstate.consider_update_vm &&
-		prstate.all_visible &&
-		!do_prune && !do_freeze &&
-		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
-	{
-		prstate.consider_update_vm = false;
-		prstate.all_visible = prstate.all_frozen = false;
-	}
-
-	Assert(!prstate.all_frozen || prstate.all_visible);
-
-	/*
-	 * Handle setting visibility map bit based on information from the VM (if
-	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
-	 * call), and from all_visible and all_frozen variables.
-	 */
-	if (prstate.consider_update_vm)
-	{
-		/*
-		 * Clear any VM corruption. This does not need to be in a critical
-		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
-		 * we may mark the heap page buffer dirty here and could end up doing
-		 * so again later. This is not a correctness issue and is in the path
-		 * of VM corruption, so we don't have to worry about the extra
-		 * performance overhead.
-		 */
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av, prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
-
-		/* Determine if we actually need to set the VM and which bits to set. */
-		else if (prstate.all_visible &&
-				 (!blk_known_av ||
-				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
-			if (prstate.all_frozen)
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-	}
-
-	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
-
-	/*
-	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
-	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
-	 * set, we strongly prefer to keep them in sync.
-	 *
-	 * Prior to Postgres 19, it was possible for the page-level bit to be set
-	 * and the VM bit to be clear. This could happen if we crashed after
-	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
-	 */
-	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+	do_freeze = heap_page_will_freeze(relation, buffer,
+									  do_prune,
+									  do_hint_full_or_prunable,
+									  did_tuple_hint_fpi,
+									  &prstate,
+									  &all_frozen_except_lp_dead);
+
+	do_set_vm = heap_page_will_update_vm(relation,
+										 buffer, blockno, page,
+										 reason,
+										 do_prune, do_freeze,
+										 blk_known_av,
+										 &prstate,
+										 &vmbuffer,
+										 &vmflags, &set_pd_all_visible);
 
 	/* Save these for the caller in case we later zero out vmflags */
 	presult->new_vmbits = vmflags;
 
-	/* Any error while applying the changes is critical */
+	/*
+	 * Time to actually make the changes to the page and log them. Any error
+	 * while applying the changes is critical.
+	 */
 	START_CRIT_SECTION();
 
 	if (do_hint_full_or_prunable)
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-08-27 19:08  Melanie Plageman <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-08-27 19:08 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>

On Tue, Aug 26, 2025 at 4:01 PM Kirill Reshke <[email protected]> wrote:
>
> Few comments on 0003.
>
> 1) This patch introduces XLHP_HAS_VMFLAGS. However it lacks some
> helpful comments about this new status bit.

I added the ones you suggested in my v7 posted here [1].

> 2) Should we move conflict_xid = visibility_cutoff_xid; assignment
> just after heap_page_is_all_visible_except_lpdead call in
> lazy_vacuum_heap_page?

Why would we want to do that? We only want to set it if the page is
all visible, so we would have to guard it similarly.

> 3) Looking at this diff, do not comprehend one bit: how are we
> protected from passing an all-visible page to lazy_vacuum_heap_page. I
> did not manage to reproduce such behaviour though.
>
> + if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
> + {
> + Assert(!PageIsAllVisible(page));
> + set_pd_all_vis = true;
> + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
> + PageSetAllVisible(page);
> + visibilitymap_set_vmbyte(vacrel->rel,
> + blkno,

So, for one, there is an assert just above this code in
lazy_vacuum_heap_page() that nunused > 0 -- so we know that the page
couldn't have been all-visible already because it had unused line
pointers.

Otherwise, if it was possible for an already all-visible page to get
here, the same thing would happen that happens on master --
heap_page_is_all_visible[_except_lpdead()] would return true and we
would try to set the VM which would end up being a no-op.

- Melanie

[1] https://www.postgresql.org/message-id/CAAKRu_YD0ecXeAh%2BDmJpzQOJwcRzmMyGdcc5W_0pEF78rYSJkQ%40mail.g...





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-08-28 09:11  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Kirill Reshke @ 2025-08-28 09:11 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>

On Thu, 28 Aug 2025 at 00:02, Melanie Plageman
<[email protected]> wrote:

> > Do we need to pin vmbuffer here? Looks like
> > XLogReadBufferForRedoExtended already pins vmbuffer. I verified this
> > with CheckBufferIsPinnedOnce(vmbuffer) just before visibilitymap_pin
> > and COPY ... WITH (FREEZE true) test.
>
> I thought the reason visibilitymap_set() did it was that it was
> possible for the block of the VM corresponding to the block of the
> heap to be different during recovery than it was when emitting the
> record, and thus we needed the part of visiblitymap_pin() that
> released the old vmbuffer and got the new one corresponding to the
> heap block.
>
> I can't quite think of how this could happen though.
>
> Assuming it can't happen, then we can get rid of visiblitymap_pin()
> (and add visibilitymap_pin_ok()) in both visiblitymap_set_vmbyte() and
> visibilitymap_set(). I've done this to visibilitymap_set() in a
> separate patch 0001. I would like other opinions/confirmation that the
> block of the VM corresponding to the heap block cannot differ during
> recovery from that what it was when the record was emitted during
> normal operation, though.

I did micro git-blame research here. I spotted only one related change
[0]. Looks like before this change pin was indeed needed.
But not after this change, so this visibilitymap_pin is just an oversight?
Related thread is [1]. I quickly checked the discussion in this
thread, and it looks like no one was bothered about these lines or VM
logging changes (in this exact pin buffer aspect). The discussion was
of other aspects of this commit.

[0] https://github.com/postgres/postgres/commit/2c03216d8311
[1] https://www.postgresql.org/message-id/533D6CBF.6080203%40vmware.com


-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-02 21:52  Melanie Plageman <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-09-02 21:52 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; Heikki Linnakangas <[email protected]>

On Thu, Aug 28, 2025 at 5:12 AM Kirill Reshke <[email protected]> wrote:
>
> I did micro git-blame research here. I spotted only one related change
> [0]. Looks like before this change pin was indeed needed.
> But not after this change, so this visibilitymap_pin is just an oversight?
> Related thread is [1]. I quickly checked the discussion in this
> thread, and it looks like no one was bothered about these lines or VM
> logging changes (in this exact pin buffer aspect). The discussion was
> of other aspects of this commit.

Wow, thanks so much for doing that research. Looking at it myself, it
does indeed seem like just an oversight. It isn't harmful since it
won't take another pin, but it is confusing, so I think we should at
least remove it in master. I'm not as sure about back branches.

I would like someone to confirm that there is no way we could end up
with a different block of the VM containing the vm bits for a heap
block during recovery than during normal operation.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-02 23:11  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 2 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-09-02 23:11 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; Heikki Linnakangas <[email protected]>

On Tue, Sep 2, 2025 at 5:52 PM Melanie Plageman
<[email protected]> wrote:
>
> On Thu, Aug 28, 2025 at 5:12 AM Kirill Reshke <[email protected]> wrote:
> >
> > I did micro git-blame research here. I spotted only one related change
> > [0]. Looks like before this change pin was indeed needed.
> > But not after this change, so this visibilitymap_pin is just an oversight?
> > Related thread is [1]. I quickly checked the discussion in this
> > thread, and it looks like no one was bothered about these lines or VM
> > logging changes (in this exact pin buffer aspect). The discussion was
> > of other aspects of this commit.
>
> Wow, thanks so much for doing that research. Looking at it myself, it
> does indeed seem like just an oversight. It isn't harmful since it
> won't take another pin, but it is confusing, so I think we should at
> least remove it in master. I'm not as sure about back branches.

I've updated the commit message in the patch set to reflect the
research you did in attached v8.

- Melanie


Attachments:

  [text/x-patch] v8-0002-Add-assert-and-log-message-to-visibilitymap_set.patch (1.8K, 2-v8-0002-Add-assert-and-log-message-to-visibilitymap_set.patch)
  download | inline diff:
From 7c5cb3edf89735eaa8bee9ca46111bd6c554720b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 27 Aug 2025 10:07:29 -0400
Subject: [PATCH v8 02/22] Add assert and log message to visibilitymap_set

Add an assert to visibilitymap_set() that the provided heap buffer is
exclusively locked, which is expected.

Also, enhance the debug logging message to specify which VM flags were
set.

Based on a related suggestion by Kirill Reshke on an in-progress
patchset.

Discussion: https://postgr.es/m/CALdSSPhAU56g1gGVT0%2BwG8RrSWE6qW8TOfNJS1HNAWX6wPgbFA%40mail.gmail.com
---
 src/backend/access/heap/visibilitymap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 953ad4a4843..7306c16f05c 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -255,7 +255,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	uint8		status;
 
 #ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
+	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+		 flags, RelationGetRelationName(rel), heapBlk);
 #endif
 
 	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
@@ -269,6 +270,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
 		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
 
+	Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
+
 	/* Check that we have the right VM page pinned */
 	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
 		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-- 
2.43.0



  [text/x-patch] v8-0001-Remove-unneeded-VM-pin-from-VM-replay.patch (1.6K, 3-v8-0001-Remove-unneeded-VM-pin-from-VM-replay.patch)
  download | inline diff:
From dd98177294011ee93cac122405516abd89f4e393 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 27 Aug 2025 08:50:15 -0400
Subject: [PATCH v8 01/22] Remove unneeded VM pin from VM replay

Previously, heap_xlog_visible() called visibilitymap_pin() even after
getting a buffer from XLogReadBufferForRedoExtended() -- which returns a
pinned buffer containing the specified block of the visibility map.

This would just have resulted in visibilitymap_pin() returning early
since the specified page was already present and pinned, but it was
confusing extraneous code, so remove it.

It appears to be an oversight in 2c03216.

Author: Melanie Plageman <[email protected]>
Reported-by: Melanie Plageman <[email protected]>
Reported-by: Kirill Reshke <[email protected]>

Discussion: https://postgr.es/m/CALdSSPhu7WZd%2BEfQDha1nz%3DDC93OtY1%3DUFEdWwSZsASka_2eRQ%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5d48f071f53..69e2003a76f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -295,8 +295,8 @@ heap_xlog_visible(XLogReaderState *record)
 		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		reln = CreateFakeRelcacheEntry(rlocator);
-		visibilitymap_pin(reln, blkno, &vmbuffer);
 
+		Assert(visibilitymap_pin_ok(blkno, vmbuffer));
 		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
 						  xlrec->snapshotConflictHorizon, vmbits);
 
-- 
2.43.0



  [text/x-patch] v8-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch (24.9K, 4-v8-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch)
  download | inline diff:
From dc318358572f61efbd0e05aae2b9a077b422bcf5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v8 05/22] Eliminate xl_heap_visible from vacuum phase III

Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c  | 143 +++++++++++++++++++++---
 src/backend/access/heap/pruneheap.c    |  48 +++++++-
 src/backend/access/heap/vacuumlazy.c   | 149 +++++++++++++++++--------
 src/backend/access/rmgrdesc/heapdesc.c |  13 ++-
 src/include/access/heapam.h            |   9 ++
 src/include/access/heapam_xlog.h       |   7 +-
 6 files changed, 300 insertions(+), 69 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 0c902c87682..e68e61feade 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. If pruning, that
+	 * means we cannot remove tuples still visible to transactions on the
+	 * standby. If freezing, that means we cannot freeze tuples with xids that
+	 * are still considered running on the standby. And for setting the VM, we
+	 * cannot do so if the page isn't all-visible to all transactions on the
+	 * standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -70,13 +76,29 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 												rlocator);
 	}
 
+	/* Next are the optionally included vmflags. Copy them out for later use. */
+	if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+	{
+		/* memcpy because vmflags is stored unaligned */
+		memcpy(&vmflags, maindataptr, sizeof(uint8));
+		maindataptr += sizeof(uint8);
+
+		/*
+		 * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
+		 * because we already have XLHP_IS_CATALOG_REL.
+		 */
+		Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+		/* Must never set all_frozen bit without also setting all_visible bit */
+		Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
+	}
+
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -89,6 +111,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Size		datalen;
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
+		bool		do_prune;
+		bool		mark_buffer_dirty;
+		bool		set_heap_lsn;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +122,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+		set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 ||
+			   vmflags & VISIBILITYMAP_VALID_BITS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,26 +170,72 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		Assert(BufferIsValid(buffer) &&
+			   BufferGetBlockNumber(buffer) == blkno);
+
+		/*
+		 * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+		 * also going to set bits in the VM later.
+		 *
+		 * We must never end up with the VM bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+		 * modification would fail to clear the VM bit.
+		 */
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+
+			/*
+			 * If the only change to the heap page is setting PD_ALL_VISIBLE,
+			 * we can avoid setting the page LSN unless checksums or
+			 * wal_log_hints are enabled.
+			 */
+			set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+			mark_buffer_dirty = true;
+		}
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
 
-		PageSetLSN(page, lsn);
-		MarkBufferDirty(buffer);
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+		if (set_heap_lsn)
+			PageSetLSN(page, lsn);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or will be setting a page in
+	 * the visibility map, update the free space map.
+	 *
+	 * Even if we are just updating the VM (and thus not freeing up any
+	 * space), we'll still update the FSM for this page. Since FSM is not
+	 * WAL-logged and only updated heuristically, it easily becomes stale in
+	 * standbys.  If the standby is later promoted and runs VACUUM, it will
+	 * skip updating individual free space figures for pages that became
+	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
+	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+	 * space values to upper FSM layers; later inserters try to use such pages
+	 * only to find out that they are unusable.  This can cause long stalls
+	 * when there are many such pages.
+	 *
+	 * Forestall those problems by updating FSM's idea about a page that is
+	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
+	 *
+	 * We want to avoid holding an exclusive lock on the heap buffer while
+	 * doing IO (either of the FSM or the VM), so we'll release the lock on
+	 * the heap buffer before doing either.
 	 */
 	if (BufferIsValid(buffer))
 	{
-		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
-						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
+		if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+							XLHP_HAS_DEAD_ITEMS |
+							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+			vmflags & VISIBILITYMAP_VALID_BITS)
 		{
 			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
 
@@ -168,6 +246,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		else
 			UnlockReleaseBuffer(buffer);
 	}
+
+	/*
+	 * Read and update the VM block. Even if we skipped updating the heap page
+	 * due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * Note that it is *only* okay that we do not hold a lock on the heap page
+	 * because we are in recovery and can expect no other writers to clear
+	 * PD_ALL_VISIBLE before we are able to update the VM.
+	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		uint8		old_vmbits = 0;
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		visibilitymap_pin(reln, blkno, &vmbuffer);
+		old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7ebd22f00a3..f1a8f938e9e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -2045,12 +2047,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer,
+						  uint8 vmflags,
+						  bool set_pd_all_vis,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
 						  PruneReason reason,
@@ -2062,6 +2075,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2084,19 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
 	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2136,6 +2165,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Prepare the main xl_heap_prune record.  We already set the XLHP_HAS_*
 	 * flag above.
 	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		xlrec.flags |= XLHP_HAS_VMFLAGS;
 	if (RelationIsAccessibleInLogicalDecoding(relation))
 		xlrec.flags |= XLHP_IS_CATALOG_REL;
 	if (TransactionIdIsValid(conflict_xid))
@@ -2150,6 +2181,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	XLogRegisterData(&xlrec, SizeOfHeapPrune);
 	if (TransactionIdIsValid(conflict_xid))
 		XLogRegisterData(&conflict_xid, sizeof(TransactionId));
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterData(&vmflags, sizeof(uint8));
 
 	switch (reason)
 	{
@@ -2168,5 +2201,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+
+	/*
+	 * If pruning or freezing tuples or setting the page all-visible when
+	 * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+	 * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+	 * but this is deemed okay for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+		PageSetLSN(BufferGetPage(buffer), recptr);
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f4e29aecf46..1d3feab4ded 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
+static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+												   TransactionId OldestXmin,
+												   OffsetNumber *deadoffsets,
+												   int allowed_num_offsets,
+												   bool *all_frozen,
+												   TransactionId *visibility_cutoff_xid,
+												   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2846,8 +2848,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
+	bool		set_pd_all_vis = false;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2858,6 +2863,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
+											   vacrel->cutoffs.OldestXmin,
+											   deadoffsets, num_offsets,
+											   &all_frozen, &visibility_cutoff_xid,
+											   &vacrel->offnum))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2877,6 +2896,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
+		Assert(!PageIsAllVisible(page));
+		set_pd_all_vis = true;
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		PageSetAllVisible(page);
+		visibilitymap_set_vmbyte(vacrel->rel,
+								 blkno,
+								 vmbuffer, vmflags);
+		conflict_xid = visibility_cutoff_xid;
+	}
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2886,7 +2917,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  vmbuffer,
+								  vmflags,
+								  set_pd_all_vis,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
@@ -2895,39 +2929,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
-	 */
 	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
-	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
-								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  flags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
@@ -3593,6 +3600,25 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
+/*
+ * Wrapper for heap_page_is_all_visible_except_lpdead() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+												  NULL, 0,
+												  all_frozen,
+												  visibility_cutoff_xid,
+												  logging_offnum);
+}
+
 /*
  * Check if every tuple in the given page is visible to all current and future
  * transactions.
@@ -3606,23 +3632,35 @@ dead_items_cleanup(LVRelState *vacrel)
  * visible tuples. Sets *all_frozen to true if every tuple on this page is
  * frozen.
  *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * deadoffsets are the offsets we know about and are about to set LP_UNUSED.
+ * allowed_num_offsets is the number of those. As long as the LP_DEAD items we
+ * encounter on the page match those exactly, we can set the page all-visible
+ * in the VM.
+ *
+ * Callers looking to verify that the page is all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
+heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+									   TransactionId OldestXmin,
+									   OffsetNumber *deadoffsets,
+									   int allowed_num_offsets,
+									   bool *all_frozen,
+									   TransactionId *visibility_cutoff_xid,
+									   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	OffsetNumber current_dead_offsets[MaxHeapTuplesPerPage];
+	size_t		current_num_offsets = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
@@ -3654,9 +3692,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			current_dead_offsets[current_num_offsets++] = offnum;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
@@ -3723,7 +3760,23 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
-	return all_visible;
+	/* If we already know it's not all-visible, return false */
+	if (!all_visible)
+		return false;
+
+	/* If we weren't allowed any dead offsets, we're done */
+	if (allowed_num_offsets == 0)
+		return current_num_offsets == 0;
+
+	/* If the number of dead offsets has changed, that's wrong */
+	if (current_num_offsets != allowed_num_offsets)
+		return false;
+
+	Assert(deadoffsets);
+
+	/* The dead offsets must be the same dead offsets */
+	return memcmp(current_dead_offsets, deadoffsets,
+				  allowed_num_offsets * sizeof(OffsetNumber)) == 0;
 }
 
 /*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..d6c86ccac20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -266,6 +266,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 {
 	char	   *rec = XLogRecGetData(record);
 	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+	char	   *maindataptr = rec + SizeOfHeapPrune;
 
 	info &= XLOG_HEAP_OPMASK;
 	if (info == XLOG_HEAP2_PRUNE_ON_ACCESS ||
@@ -278,7 +279,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		{
 			TransactionId conflict_xid;
 
-			memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
+			memcpy(&conflict_xid, maindataptr, sizeof(TransactionId));
+			maindataptr += sizeof(TransactionId);
 
 			appendStringInfo(buf, "snapshotConflictHorizon: %u",
 							 conflict_xid);
@@ -287,6 +289,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & XLHP_HAS_VMFLAGS)
+		{
+			uint8		vmflags;
+
+			memcpy(&vmflags, maindataptr, sizeof(uint8));
+			maindataptr += sizeof(uint8);
+			appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+		}
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
 										   Buffer buffer);
 extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer,
+									  uint8 vmflags,
+									  bool vm_modified_heap_page,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
 									  PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..d6a479f6984 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -289,12 +289,17 @@ typedef struct xl_heap_prune
 
 	/*
 	 * If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
-	 * unaligned
+	 * unaligned.
+	 *
+	 * Then, if XLHP_HAS_VMFLAGS is set, the VM flags follow, unaligned.
 	 */
 } xl_heap_prune;
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
 
+/* If the record should update the VM, it contains their new value */
+#define		XLHP_HAS_VMFLAGS			(1 << 0)
+
 /* to handle recovery conflict during logical decoding on standby */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
-- 
2.43.0



  [text/x-patch] v8-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patch (5.3K, 5-v8-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patch)
  download | inline diff:
From 0a31bc0bc1012de3ba3ce1194d5ce578f375025c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v8 04/22] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need two parameters from the
LVRelState, so just pass those in explicitly.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 45 ++++++++++++++++++----------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 932701d8420..f4e29aecf46 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,8 +463,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2009,8 +2012,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2906,8 +2910,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * emitted.
 	 */
 	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
@@ -3591,9 +3595,16 @@ dead_items_cleanup(LVRelState *vacrel)
 
 /*
  * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * Return the visibility_cutoff_xid which is the highest xmin amongst the
+ * visible tuples. Sets *all_frozen to true if every tuple on this page is
+ * frozen.
  *
  * This is a stripped down version of lazy_scan_prune().  If you change
  * anything here, make sure that everything stays in sync.  Note that an
@@ -3601,9 +3612,11 @@ dead_items_cleanup(LVRelState *vacrel)
  * introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3626,7 +3639,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3650,9 +3663,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
 										 buf))
 		{
 			case HEAPTUPLE_LIVE:
@@ -3673,7 +3686,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+											   OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3708,7 +3721,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
-- 
2.43.0



  [text/x-patch] v8-0008-Combine-vacuum-phase-I-VM-update-cases.patch (5.8K, 6-v8-0008-Combine-vacuum-phase-I-VM-update-cases.patch)
  download | inline diff:
From e5d0f1c76b805de9de81d31e29c706fd5c8905e9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v8 08/22] Combine vacuum phase I VM update cases

We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.

Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.

The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
 src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
 1 file changed, 32 insertions(+), 69 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 406c30e6ecd..4d47a6b394a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2151,11 +2151,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
-	else if (!all_visible_according_to_vm && presult.all_visible)
+
+	/*
+	 * If the page isn't yet marked all-visible in the VM or it is and needs
+	 * to me marked all-frozen, update the VM Note that all_frozen is only
+	 * valid if all_visible is true, so we must check both all_visible and
+	 * all_frozen.
+	 */
+	else if (presult.all_visible &&
+			 (!all_visible_according_to_vm ||
+			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
+		/*
+		 * If the page is all-frozen, we can pass InvalidTransactionId as our
+		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+		 * everything safe for REDO was logged when the page's tuples were
+		 * frozen.
+		 */
 		if (presult.all_frozen)
 		{
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2168,21 +2183,29 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * If the heap page is all-visible but the VM bit is not set, we don't
+		 * need to dirty the heap page.  However, if checksums are enabled, we
+		 * do need to make sure that the heap page is dirtied before passing
+		 * it to visibilitymap_set(), because it may be logged.
 		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
+		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
 									   flags);
 
 		/*
+		 * Even if we are only setting the all-frozen bit, there is a small
+		 * chance that the VM was modified sometime between setting
+		 * all_visible_according_to_vm and checking the visibility during
+		 * pruning. Check the return value of old_vmbits to ensure the
+		 * visibility map counters used for logging are accurate.
+		 *
 		 * If the page wasn't already set all-visible and/or all-frozen in the
 		 * VM, count it as newly set for logging.
 		 */
@@ -2203,66 +2226,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v8-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patch (5.8K, 7-v8-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patch)
  download | inline diff:
From 0d8d3a6b124f244933dbdc50fca90340715bffd5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v8 06/22] Use xl_heap_prune record for setting empty pages
 all-visible

As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
 src/backend/access/heap/pruneheap.c  | 14 +++++--
 src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++----------
 src/include/access/heapam.h          |  1 +
 3 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f1a8f938e9e..956caeb69dc 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  false,
 									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
@@ -2051,6 +2052,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
  * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
  * the page LSN when checksums/wal_log_hints are enabled even if we did not
  * prune or freeze tuples on the page.
@@ -2061,6 +2065,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  bool force_heap_fpi,
 						  Buffer vmbuffer,
 						  uint8 vmflags,
 						  bool set_pd_all_vis,
@@ -2089,13 +2094,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlrec.flags = 0;
 	regbuf_flags = REGBUF_STANDARD;
 
+	if (force_heap_fpi)
+		regbuf_flags |= REGBUF_FORCE_IMAGE;
+
 	/*
 	 * We can avoid an FPI if the only modification we are making to the heap
 	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+	else if (!do_prune &&
+			 nfrozen == 0 &&
+			 (!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1d3feab4ded..49c46d35486 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1877,33 +1877,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+				VISIBILITYMAP_ALL_FROZEN;
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
-			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				PageGetLSN(page) == InvalidXLogRecPtr)
-				log_newpage_buffer(buf, true);
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			visibilitymap_set_vmbyte(vacrel->rel, blkno,
+									 vmbuffer, new_vmbits);
+
+			if (RelationNeedsWAL(vacrel->rel))
+			{
+				/*
+				 * It's possible that another backend has extended the heap,
+				 * initialized the page, and then failed to WAL-log the page
+				 * due to an ERROR.  Since heap extension is not WAL-logged,
+				 * recovery might try to replay our record setting the page
+				 * all-visible and find that the page isn't initialized, which
+				 * will cause a PANIC. To prevent that, if the page hasn't
+				 * been previously WAL-logged, force a heap FPI.
+				 */
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  PageGetLSN(page) == InvalidXLogRecPtr,
+										  vmbuffer,
+										  new_vmbits,
+										  true,
+										  InvalidTransactionId,
+										  false, PRUNE_VACUUM_SCAN,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+			}
 
-			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
-			/* Count the newly all-frozen pages for logging */
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+			/* Count the newly all-frozen pages for logging. */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
 		}
@@ -2917,6 +2931,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
+								  false,
 								  vmbuffer,
 								  vmflags,
 								  set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  bool force_heap_fpi,
 									  Buffer vmbuffer,
 									  uint8 vmflags,
 									  bool vm_modified_heap_page,
-- 
2.43.0



  [text/x-patch] v8-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch (11.7K, 8-v8-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch)
  download | inline diff:
From 76ef56d01483308c635915f8b43e67741876225c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v8 09/22] Find and fix VM corruption in
 heap_page_prune_and_freeze

Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
 src/backend/access/heap/pruneheap.c  | 87 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 77 +++---------------------
 src/include/access/heapam.h          |  4 ++
 3 files changed, 96 insertions(+), 72 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 956caeb69dc..72216126945 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
+
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+			heap_page_prune_and_freeze(relation, buffer, false,
+									   InvalidBuffer,
+									   vistest, 0,
 									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
  * (see heap_prune_satisfies_vacuum).
  *
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   bool blk_known_av,
+						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
 						   int options,
 						   struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	/*
+	 * Clear any VM corruption. This does not need to be done in a critical
+	 * section.
+	 */
+	presult->vm_corruption = false;
+	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+																blockno, buffer, page,
+																blk_known_av,
+																prstate.lpdead_items, vmbuffer);
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4d47a6b394a..64ae63dcb12 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
-										   BlockNumber heap_blk,
-										   Buffer heap_buffer, Page heap_page,
-										   bool heap_blk_known_av,
-										   int64 nlpdead_items,
-										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1938,65 +1932,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
-							   BlockNumber heap_blk,
-							   Buffer heap_buffer, Page heap_page,
-							   bool heap_blk_known_av,
-							   int64 nlpdead_items,
-							   Buffer vmbuffer)
-{
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
-		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	return false;
-}
-
 
 /* qsort comparator for sorting OffsetNumbers */
 static int
@@ -2055,11 +1990,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+	heap_page_prune_and_freeze(rel, buf,
+							   all_visible_according_to_vm,
+							   vmbuffer,
+							   vacrel->vistest, prune_options,
 							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2144,10 +2082,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables. Start by looking for any VM corruption.
+	 * all_frozen variables.
 	 */
-	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
-									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	if (presult.vm_corruption)
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
+	bool		vm_corruption;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   bool blk_known_av,
+									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
 									   int options,
 									   struct VacuumCutoffs *cutoffs,
-- 
2.43.0



  [text/x-patch] v8-0007-Combine-lazy_scan_prune-VM-corruption-cases.patch (7.1K, 9-v8-0007-Combine-lazy_scan_prune-VM-corruption-cases.patch)
  download | inline diff:
From 393bce514362c05bed2eba71f1bfad649507d058 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v8 07/22] Combine lazy_scan_prune VM corruption cases

lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.

Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.

This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
 src/backend/access/heap/vacuumlazy.c | 114 +++++++++++++++++----------
 1 file changed, 73 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 49c46d35486..406c30e6ecd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,6 +430,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1932,6 +1938,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
+
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2078,9 +2144,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * all_frozen variables. Start by looking for any VM corruption.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	{
+		/* Don't update the VM if we just cleared corruption in it */
+	}
+	else if (!all_visible_according_to_vm && presult.all_visible)
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2132,45 +2203,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-- 
2.43.0



  [text/x-patch] v8-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch (11.3K, 10-v8-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch)
  download | inline diff:
From 07f31099754636ec9dabf6cca06c33c4b19c230c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v8 03/22] Eliminate xl_heap_visible in COPY FREEZE

Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c        | 48 ++++++++++--------
 src/backend/access/heap/heapam_xlog.c   | 39 +++++++++++++-
 src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c  |  5 ++
 src/include/access/visibilitymap.h      |  2 +
 5 files changed, 138 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e3e7307ef5f..035280dc30a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2504,9 +2504,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * If the page is all visible, need to clear that, unless we're only
 		 * going to add further frozen rows to it.
-		 *
-		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2516,8 +2513,23 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+
+		/*
+		 * If we're only adding already frozen rows to a previously empty
+		 * page, mark it as all-frozen and update the visibility map. We're
+		 * already holding a pin on the vmbuffer.
+		 */
 		else if (all_frozen_set)
+		{
+			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
 			PageSetAllVisible(page);
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			visibilitymap_set_vmbyte(relation,
+									 BufferGetBlockNumber(buffer),
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN);
+		}
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2637,29 +2658,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
-		 */
 		if (all_frozen_set)
-		{
-			Assert(PageIsAllVisible(page));
-			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 69e2003a76f..0c902c87682 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -552,6 +552,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -572,11 +573,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -663,6 +664,42 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Now read and update the VM block. Even if we skipped updating the heap
+	 * page due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * It is only okay to set the VM bits without holding the heap page lock
+	 * because we can expect no other writers of this page.
+	 */
+	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		Assert(visibilitymap_pin_ok(blkno, vmbuffer));
+		visibilitymap_set_vmbyte(reln, blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN);
+
+		/*
+		 * It is not possible that the VM was already set for this heap page,
+		 * so the vmbuffer must have been modified and marked dirty.
+		 */
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..64dff7a0026 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page
+ *		visibilitymap_set	 - set a bit in a previously pinned page and log
+ *      visibilitymap_set_vmbyte - set a bit in a pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+		 flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+	/* Call in same critical section where WAL is emitted. */
+	Assert(InRecovery || CritSectionCount > 0);
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	Assert(BufferIsExclusiveLocked(vmBuf));
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..977566f6b98 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v8-0014-Remove-xl_heap_visible-entirely.patch (24.4K, 11-v8-0014-Remove-xl_heap_visible-entirely.patch)
  download | inline diff:
From 91d6e524a46c4d19dfe82c368ee98a950753cfb4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v8 14/22] Remove xl_heap_visible entirely

There are now no users of this, so eliminate it entirely.
---
 src/backend/access/common/bufmask.c      |   3 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 154 +----------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  10 +-
 src/backend/access/heap/visibilitymap.c  | 109 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  11 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 30 insertions(+), 368 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 035280dc30a..88f880cfd15 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
 #include "access/valid.h"
 #include "access/visibilitymap.h"
 #include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
@@ -2524,11 +2525,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
 			PageSetAllVisible(page);
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			visibilitymap_set_vmbyte(relation,
-									 BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set(relation,
+							  BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN);
 		}
 
 		/*
@@ -8796,49 +8797,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
 
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index e68e61feade..83a5f3dbc34 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -83,10 +83,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		memcpy(&vmflags, maindataptr, sizeof(uint8));
 		maindataptr += sizeof(uint8);
 
-		/*
-		 * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
-		 * because we already have XLHP_IS_CATALOG_REL.
-		 */
 		Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
 		/* Must never set all_frozen bit without also setting all_visible bit */
 		Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
@@ -268,7 +264,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+		old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
 			PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -279,143 +275,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		UnlockReleaseBuffer(vmbuffer);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		Assert(visibilitymap_pin_ok(blkno, vmbuffer));
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -792,16 +651,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
 		Assert(visibilitymap_pin_ok(blkno, vmbuffer));
-		visibilitymap_set_vmbyte(reln, blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN);
 
 		/*
 		 * It is not possible that the VM was already set for this heap page,
 		 * so the vmbuffer must have been modified and marked dirty.
 		 */
 		Assert(BufferIsDirty(vmbuffer));
+		visibilitymap_set(reln, blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
 		PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		FreeFakeRelcacheEntry(reln);
 	}
@@ -1381,9 +1240,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ef9bb0c273a..de656087941 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -979,8 +979,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_set_vm)
 		{
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
-												  vmbuffer, vmflags);
+			old_vmbits = visibilitymap_set(relation, blockno,
+										   vmbuffer, vmflags);
 
 			if (old_vmbits == vmflags)
 			{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5129c13fee9..66ce30ddf03 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1886,8 +1886,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 			MarkBufferDirty(buf);
 
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			visibilitymap_set_vmbyte(vacrel->rel, blkno,
-									 vmbuffer, new_vmbits);
+			visibilitymap_set(vacrel->rel, blkno,
+							  vmbuffer, new_vmbits);
 
 			if (RelationNeedsWAL(vacrel->rel))
 			{
@@ -2753,9 +2753,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		set_pd_all_vis = true;
 		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbyte(vacrel->rel,
-								 blkno,
-								 vmbuffer, vmflags);
+		visibilitymap_set(vacrel->rel,
+						  blkno,
+						  vmbuffer, vmflags);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dff7a0026..8342ec1ff22 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page and log
- *      visibilitymap_set_vmbyte - set a bit in a pinned page
+ *		visibilitymap_set	 - set a bit in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set flags in the VM block contained in the passed in vmBuf.
  *
@@ -343,8 +240,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * is pinned and exclusive locked.
  */
 uint8
-visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index d6c86ccac20..f7880a4ed81 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -351,13 +351,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -462,9 +455,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d6a479f6984..34988d564fd 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -440,20 +439,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -497,11 +482,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 977566f6b98..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..b4c880c083f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4273,7 +4273,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v8-0011-Update-VM-in-pruneheap.c.patch (12.7K, 12-v8-0011-Update-VM-in-pruneheap.c.patch)
  download | inline diff:
From 24e738f55987f2690acb8090f9aa78b7d7507d98 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v8 11/22] Update VM in pruneheap.c

As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
 src/backend/access/heap/pruneheap.c  | 99 +++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 99 +++++-----------------------
 src/include/access/heapam.h          | 15 +++--
 3 files changed, 106 insertions(+), 107 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a562573763a..fcf054d04a8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -360,7 +360,8 @@ identify_and_fix_vm_corruption(Relation relation,
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -436,6 +437,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint;
+	uint8		vmflags = 0;
+	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -936,7 +939,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 *
 	 * Now that freezing has been finalized, unset all_visible if there are
 	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
+	 * of the page, as expected for updating the visibility map.
 	 */
 	if (prstate.all_visible && prstate.lpdead_items == 0)
 	{
@@ -952,31 +955,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->hastup = prstate.hastup;
 
 	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so the VM update record doesn't need it.
 	 */
 	if (presult->all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
 		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
 
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
 	/*
-	 * Clear any VM corruption. This does not need to be done in a critical
-	 * section.
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
 	 */
-	presult->vm_corruption = false;
 	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
-		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
-																blockno, buffer, page,
-																blk_known_av,
-																prstate.lpdead_items, vmbuffer);
+	{
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av,
+										   prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/*
+		 * If the page isn't yet marked all-visible in the VM or it is and
+		 * needs to me marked all-frozen, update the VM Note that all_frozen
+		 * is only valid if all_visible is true, so we must check both
+		 * all_visible and all_frozen.
+		 */
+		else if (presult->all_visible &&
+				 (!blk_known_av ||
+				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			Assert(prstate.lpdead_items == 0);
+			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			/*
+			 * If the page is all-frozen, we can pass InvalidTransactionId as
+			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+			 * make everything safe for REDO was logged when the page's tuples
+			 * were frozen.
+			 */
+			if (presult->all_frozen)
+			{
+				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			}
+
+			/*
+			 * It's possible for the VM bit to be clear and the page-level bit
+			 * to be set if checksums are not enabled.
+			 *
+			 * And even if we are just planning to update the frozen bit in
+			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
+			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+			 * might have become stale.
+			 *
+			 * If the heap page is all-visible but the VM bit is not set, we
+			 * don't need to dirty the heap page.  However, if checksums are
+			 * enabled, we do need to make sure that the heap page is dirtied
+			 * before passing it to visibilitymap_set(), because it may be
+			 * logged.
+			 */
+			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+			{
+				PageSetAllVisible(page);
+				MarkBufferDirty(buffer);
+			}
+
+			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+										   vmbuffer, presult->vm_conflict_horizon,
+										   vmflags);
+		}
+	}
+
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
+
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = vmflags;
+
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 64ae63dcb12..892081033cc 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1932,7 +1932,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -1948,7 +1947,8 @@ cmpOffsetNumbers(const void *a, const void *b)
  * vmbuffer is the buffer containing the VM block with visibility information
  * for the heap block, blkno. all_visible_according_to_vm is the saved
  * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1977,6 +1977,7 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
+	 * Then, if the page's visibility status has changed, update the VM.
 	 *
 	 * If the relation has no indexes, we can immediately mark would-be dead
 	 * items LP_UNUSED.
@@ -1985,10 +1986,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * presult.ndeleted.  It should not be confused with presult.lpdead_items;
 	 * presult.lpdead_items's final value can be thought of as the number of
 	 * tuples that were deleted from indexes.
-	 *
-	 * We will update the VM after collecting LP_DEAD items and freezing
-	 * tuples. Pruning will have determined whether or not the page is
-	 * all-visible.
 	 */
 	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
@@ -2080,88 +2077,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (presult.vm_corruption)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		/* Don't update the VM if we just cleared corruption in it */
-	}
-
-	/*
-	 * If the page isn't yet marked all-visible in the VM or it is and needs
-	 * to me marked all-frozen, update the VM Note that all_frozen is only
-	 * valid if all_visible is true, so we must check both all_visible and
-	 * all_frozen.
-	 */
-	else if (presult.all_visible &&
-			 (!all_visible_according_to_vm ||
-			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		/*
-		 * If the page is all-frozen, we can pass InvalidTransactionId as our
-		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
-		 * everything safe for REDO was logged when the page's tuples were
-		 * frozen.
-		 */
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * If the heap page is all-visible but the VM bit is not set, we don't
-		 * need to dirty the heap page.  However, if checksums are enabled, we
-		 * do need to make sure that the heap page is dirtied before passing
-		 * it to visibilitymap_set(), because it may be logged.
-		 */
-		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * Even if we are only setting the all-frozen bit, there is a small
-		 * chance that the VM was modified sometime between setting
-		 * all_visible_according_to_vm and checking the visibility during
-		 * pruning. Check the return value of old_vmbits to ensure the
-		 * visibility map counters used for logging are accurate.
-		 *
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
+	}
 
 	return presult.ndeleted;
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
+	 * all_visible and all_frozen indicate the status of the page as reflected
+	 * in the visibility map after pruning, freezing, and setting any pages
+	 * all-visible in the visibility map.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * vm_conflict_horizon is the newest xmin of live tuples on the page
+	 * (older than OldestXmin).  It will only be valid if we did not set the
+	 * page all-frozen in the VM.
 	 *
 	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
-	bool		vm_corruption;
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
-- 
2.43.0



  [text/x-patch] v8-0010-Keep-all_frozen-updated-too-in-heap_page_prune_an.patch (3.0K, 13-v8-0010-Keep-all_frozen-updated-too-in-heap_page_prune_an.patch)
  download | inline diff:
From cd33da95773743e046219d8bc94d9c929cd5be7f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v8 10/22] Keep all_frozen updated too in
 heap_page_prune_and_freeze

We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
---
 src/backend/access/heap/pruneheap.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 72216126945..a562573763a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
 	 * whether to freeze the page or not.  The all_visible and all_frozen
 	 * values returned to the caller are adjusted to include LP_DEAD items at
 	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -824,6 +820,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -1468,7 +1465,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1490,7 +1487,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1503,7 +1500,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1522,7 +1519,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1540,7 +1537,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
-- 
2.43.0



  [text/x-patch] v8-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch (28.5K, 14-v8-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch)
  download | inline diff:
From 41eb35a7bbeff71763c2be79ae4ecae7f29e4d6a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v8 12/22] Eliminate xl_heap_visible from vacuum phase I
 prune/freeze

Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.

This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
 src/backend/access/heap/pruneheap.c  | 454 ++++++++++++++++-----------
 src/backend/access/heap/vacuumlazy.c |  30 --
 src/include/access/heapam.h          |  15 +-
 3 files changed, 278 insertions(+), 221 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fcf054d04a8..7cef05be5d0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		freeze;
+
+	/*
+	 * Whether or not to consider updating the VM. There is some bookkeeping
+	 * that must be maintained if we would like to update the VM.
+	 */
+	bool		consider_update_vm;
+
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
 	 *
 	 * These fields are not used by pruning itself for the most part, but are
 	 * used to collect information about what was pruned and what state the
-	 * page is in after pruning, for the benefit of the caller.  They are
-	 * copied to the caller's PruneFreezeResult at the end.
+	 * page is in after pruning to use when updating the visibility map and
+	 * for the benefit of the caller.  They are copied to the caller's
+	 * PruneFreezeResult at the end.
 	 * -------------------------------------------------------
 	 */
 
@@ -138,11 +146,10 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+	 * directly before updating the VM. We ignore LP_DEAD items when deciding
+	 * whether or not to opportunistically freeze and when determining the
+	 * snapshot conflict horizon required when freezing tuples.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -371,12 +378,15 @@ identify_and_fix_vm_corruption(Relation relation,
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
  *
  * blk_known_av is the visibility status of the heap block as of the last call
  * to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -392,6 +402,8 @@ identify_and_fix_vm_corruption(Relation relation,
  *   FREEZE indicates that we will also freeze tuples, and will return
  *   'all_visible', 'all_frozen' flags to the caller.
  *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -436,18 +448,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	HeapTupleData tup;
 	bool		do_freeze;
 	bool		do_prune;
-	bool		do_hint;
+	bool		do_hint_full_or_prunable;
+	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	bool		all_frozen_except_lp_dead = false;
+	bool		set_pd_all_visible = false;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
+	Assert(!prstate.consider_update_vm || vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -492,50 +510,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Keep track of whether or not the page will be all-visible and
+	 * all-frozen for use in opportunistic freezing and to update the VM if
+	 * the caller requests it.
+	 *
+	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
+	 * other callers could do either one. The visibility bookkeeping is
+	 * required for opportunistic freezing (in addition to setting the VM
+	 * bits) because we only consider opportunistically freezing tuples if the
+	 * whole page would become all-frozen or if the whole page will be frozen
+	 * except for dead tuples that will be removed by vacuum.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * If only updating the VM, we must initialize all_frozen to false, as
+	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
+	 * page and we will not end up correctly setting it to false later.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * Dead tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing, so we do not clear
+	 * all_visible when we see LP_DEAD items. We fix that after determining
+	 * whether or not to freeze but before deciding whether or not to update
+	 * the VM so that we don't set the VM bit incorrectly.
+	 *
+	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * bookkeeping. Initializing all_visible to false allows skipping the work
+	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
 	if (prstate.freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if (prstate.consider_update_vm)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon.
+	 * This is most likely to happen when updating the VM and/or freezing all
+	 * live tuples on the page. It is updated before returning to the caller
+	 * because vacuum does assert-build only validation on the page using this
+	 * field.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -733,10 +758,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
+	 * pd_prune_xid field or the page was marked full, we will update those
+	 * hint bits.
 	 */
-	do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_full_or_prunable =
+		((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
 	/*
@@ -784,7 +810,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 						if (XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
 					}
-					else if (do_hint)
+					else if (do_hint_full_or_prunable)
 					{
 						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
@@ -823,11 +849,84 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	all_frozen_except_lp_dead = prstate.all_frozen;
+	if (prstate.lpdead_items > 0)
+	{
+		prstate.all_visible = false;
+		prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
+	 */
+	if (prstate.consider_update_vm)
+	{
+		/*
+		 * Clear any VM corruption. This does not need to be in a critical
+		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+		 * we may mark the heap page buffer dirty here and could end up doing
+		 * so again later. This is not a correctness issue and is in the path
+		 * of VM corruption, so we don't have to worry about the extra
+		 * performance overhead.
+		 */
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av, prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/* Determine if we actually need to set the VM and which bits to set. */
+		else if (prstate.all_visible &&
+				 (!blk_known_av ||
+				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+			if (prstate.all_frozen)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+	}
+
+	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+	/* Save these for the caller in case we later zero out vmflags */
+	presult->new_vmbits = vmflags;
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
-	if (do_hint)
+	if (do_hint_full_or_prunable)
 	{
 		/*
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
@@ -843,15 +942,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageClearFull(page);
 
 		/*
-		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+		 * this is a non-WAL-logged hint.  If we are going to freeze or prune
+		 * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+		 * dirty and emit WAL below.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_prune && !do_freeze && !set_pd_all_visible)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -865,12 +965,47 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (set_pd_all_visible)
+			PageSetAllVisible(page);
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+		 * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+		 * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+		 * appropriate here.
 		 */
-		if (RelationNeedsWAL(relation))
+		if (do_prune || do_freeze || set_pd_all_visible)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
+												  vmbuffer, vmflags);
+
+			if (old_vmbits == vmflags)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				do_set_vm = false;
+				/* 0 out vmflags so we don't emit WAL to update the VM */
+				vmflags = 0;
+			}
+		}
+
+		/*
+		 * It should never be the case that PD_ALL_VISIBLE is not set and the
+		 * VM is set. Or, if it were, we should have caught it earlier when
+		 * finding and fixing VM corruption. So, if we found out the VM was
+		 * already set above, we should have found PD_ALL_VISIBLE set earlier.
+		 */
+		Assert(!set_pd_all_visible || do_set_vm);
+
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+		 * we were only updating the VM and it turns out it was already set,
+		 * we will have unset do_set_vm earlier. As such, check it again
+		 * before emitting the record.
+		 */
+		if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -882,35 +1017,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
-			TransactionId conflict_xid;
+			TransactionId conflict_xid = InvalidTransactionId;
 
 			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
+			 * If we are updating the VM, the conflict horizon is almost
+			 * always the visibility cutoff XID.
+			 *
+			 * Separately, if we are freezing any tuples, as an optimization,
+			 * we can use the visibility_cutoff_xid as the conflict horizon if
+			 * the page will be all-frozen. This is true even if there are
+			 * LP_DEAD line pointers because we ignored those when maintaining
+			 * the visibility_cutoff_xid.
 			 */
-			if (do_freeze)
+			if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+				conflict_xid = prstate.visibility_cutoff_xid;
+
+			/*
+			 * Otherwise, if we are freezing but the page would not be
+			 * all-frozen, we have to use the more pessimistic horizon of
+			 * OldestXmin, which may be newer than the newest tuple we froze.
+			 * We currently don't track the newest tuple we froze.
+			 */
+			else if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
+				conflict_xid = prstate.cutoffs->OldestXmin;
+				TransactionIdRetreat(conflict_xid);
 			}
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
+			/*
+			 * If we are removing tuples with a younger xmax than our so far
+			 * calculated conflict_xid, we must use this as our horizon.
+			 */
+			if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
 				conflict_xid = prstate.latest_xid_removed;
 
+			/*
+			 * We can omit the snapshot conflict horizon if we are not pruning
+			 * or freezing any tuples and are setting an already all-visible
+			 * page all-frozen in the VM. In this case, all of the tuples on
+			 * the page must already be visible to all MVCC snapshots on the
+			 * standby.
+			 */
+			if (!do_prune && !do_freeze && do_set_vm &&
+				blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+				conflict_xid = InvalidTransactionId;
+
 			log_heap_prune_and_freeze(relation, buffer,
 									  false,
-									  InvalidBuffer, 0, false,
+									  vmbuffer,
+									  vmflags,
+									  set_pd_all_visible,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -922,124 +1078,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected for updating the visibility map.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
-	presult->hastup = prstate.hastup;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so the VM update record doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * VACUUM will call heap_page_is_all_visible() during the second pass over
+	 * the heap to determine all_visible and all_frozen for the page -- this
+	 * is a specialized version of the logic from this function.  Now that
+	 * we've finished pruning and freezing, make sure that we're in total
+	 * agreement with heap_page_is_all_visible() using an assertion. We will
+	 * have already set the page in the VM, so this assertion will only let
+	 * you know that you've already done something wrong.
 	 */
-	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
 	{
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av,
-										   prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		/*
-		 * If the page isn't yet marked all-visible in the VM or it is and
-		 * needs to me marked all-frozen, update the VM Note that all_frozen
-		 * is only valid if all_visible is true, so we must check both
-		 * all_visible and all_frozen.
-		 */
-		else if (presult->all_visible &&
-				 (!blk_known_av ||
-				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			Assert(prstate.lpdead_items == 0);
-			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		Assert(cutoffs);
 
-			/*
-			 * If the page is all-frozen, we can pass InvalidTransactionId as
-			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
-			 * make everything safe for REDO was logged when the page's tuples
-			 * were frozen.
-			 */
-			if (presult->all_frozen)
-			{
-				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			}
+		Assert(prstate.lpdead_items == 0);
 
-			/*
-			 * It's possible for the VM bit to be clear and the page-level bit
-			 * to be set if checksums are not enabled.
-			 *
-			 * And even if we are just planning to update the frozen bit in
-			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
-			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
-			 * might have become stale.
-			 *
-			 * If the heap page is all-visible but the VM bit is not set, we
-			 * don't need to dirty the heap page.  However, if checksums are
-			 * enabled, we do need to make sure that the heap page is dirtied
-			 * before passing it to visibilitymap_set(), because it may be
-			 * logged.
-			 */
-			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buffer);
-			}
+		if (!heap_page_is_all_visible(relation, buffer,
+									  cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
 
-			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
-										   vmbuffer, presult->vm_conflict_horizon,
-										   vmflags);
-		}
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
+#endif
 
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->old_vmbits = old_vmbits;
+	/* new_vmbits was set above */
+	presult->hastup = prstate.hastup;
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = vmflags;
-
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
@@ -1621,7 +1708,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			break;
 	}
 
-	/* Consider freezing any normal tuples which will not be removed */
+	/*
+	 * Consider freezing any normal tuples which will not be removed.
+	 * Regardless of whether or not we want to freeze the tuples, if we want
+	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+	 * tuple to know whether or not the page will be totally frozen.
+	 */
 	if (prstate->freeze)
 	{
 		bool		totally_frozen;
@@ -2234,6 +2326,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
 	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
+	Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
 	xlrec.flags = 0;
 	regbuf_flags = REGBUF_STANDARD;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 892081033cc..5129c13fee9 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2013,34 +2013,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2074,8 +2046,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate the status of the page as reflected
-	 * in the visibility map after pruning, freezing, and setting any pages
-	 * all-visible in the visibility map.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page
-	 * (older than OldestXmin).  It will only be valid if we did not set the
-	 * page all-frozen in the VM.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
 	uint8		old_vmbits;
 	uint8		new_vmbits;
 
-- 
2.43.0



  [text/x-patch] v8-0013-Rename-PruneState.freeze-to-attempt_freeze.patch (4.1K, 15-v8-0013-Rename-PruneState.freeze-to-attempt_freeze.patch)
  download | inline diff:
From 3202f56fe96c30c79c03fa4e6090ae67012840aa Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v8 13/22] Rename PruneState.freeze to attempt_freeze

This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.

Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
 src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7cef05be5d0..ef9bb0c273a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
 	/* whether or not dead items can be set LP_UNUSED during pruning */
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
-	bool		freeze;
+	bool		attempt_freeze;
 
 	/*
 	 * Whether or not to consider updating the VM. There is some bookkeeping
@@ -452,7 +452,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
-	bool		hint_bit_fpi;
+	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	bool		all_frozen_except_lp_dead = false;
 	bool		set_pd_all_visible = false;
@@ -460,7 +460,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
 	prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
@@ -485,7 +485,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/* initialize page freezing working state */
 	prstate.pagefrz.freeze_required = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
 		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -535,7 +535,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * bookkeeping. Initializing all_visible to false allows skipping the work
 	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -653,7 +653,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
 	 * an FPI to be emitted.
 	 */
-	hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	/*
 	 * Process HOT chains.
@@ -770,7 +770,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * plans we prepared, or not.
 	 */
 	do_freeze = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (prstate.pagefrz.freeze_required)
 		{
@@ -803,7 +803,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				 */
 				if (RelationNeedsWAL(relation))
 				{
-					if (hint_bit_fpi)
+					if (did_tuple_hint_fpi)
 						do_freeze = true;
 					else if (do_prune)
 					{
@@ -1127,7 +1127,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
 		{
@@ -1714,7 +1714,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
 	 * tuple to know whether or not the page will be totally frozen.
 	 */
-	if (prstate->freeze)
+	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
 
-- 
2.43.0



  [text/x-patch] v8-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patch (7.1K, 16-v8-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patch)
  download | inline diff:
From 2189e119a3b666cb073821f8cf61ea00b9317863 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v8 15/22] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()

Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 14 +++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 13 ++++++-------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index de656087941..1300d0e89f3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -574,9 +574,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1172,11 +1172,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..5c121cd72f5 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisible(state, fxid);
 }
 
 /*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisible(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..547c71fcbfe 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v8-0016-Use-GlobalVisState-to-determine-page-level-visibi.patch (10.5K, 17-v8-0016-Use-GlobalVisState-to-determine-page-level-visibi.patch)
  download | inline diff:
From e37454b4baa95e070c3a5d39affcc2d4ae733ad3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v8 16/22] Use GlobalVisState to determine page level
 visibility

During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.

It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.

Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.

Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
 src/backend/access/heap/pruneheap.c         | 48 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 17 ++++----
 src/include/access/heapam.h                 |  4 +-
 4 files changed, 59 insertions(+), 38 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1300d0e89f3..f083189fccc 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon, when setting the VM or when
+	 * freezing all the live tuples on the page.
 	 *
 	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
 	 * directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -553,14 +552,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon.
-	 * This is most likely to happen when updating the VM and/or freezing all
-	 * live tuples on the page. It is updated before returning to the caller
-	 * because vacuum does assert-build only validation on the page using this
-	 * field.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState.
+	 *
+	 * If we encounter an uncommitted tuple, this field is unmaintained. If
+	 * the page is being set all-visible or when freezing all live tuples on
+	 * the page, it is used to calculate the snapshot conflict horizon.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -756,6 +753,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -1098,12 +1105,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(cutoffs);
-
 		Assert(prstate.lpdead_items == 0);
 
 		if (!heap_page_is_all_visible(relation, buffer,
-									  cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1628,19 +1633,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 66ce30ddf03..61c6b3d21ac 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
-												   TransactionId OldestXmin,
+												   GlobalVisState *vistest,
 												   OffsetNumber *deadoffsets,
 												   int allowed_num_offsets,
 												   bool *all_frozen,
@@ -2715,7 +2715,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 InvalidOffsetNumber);
 
 	if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
-											   vacrel->cutoffs.OldestXmin,
+											   vacrel->vistest,
 											   deadoffsets, num_offsets,
 											   &all_frozen, &visibility_cutoff_xid,
 											   &vacrel->offnum))
@@ -3458,13 +3458,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+	return heap_page_is_all_visible_except_lpdead(rel, buf, vistest,
 												  NULL, 0,
 												  all_frozen,
 												  visibility_cutoff_xid,
@@ -3499,7 +3499,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
-									   TransactionId OldestXmin,
+									   GlobalVisState *vistest,
 									   OffsetNumber *deadoffsets,
 									   int allowed_num_offsets,
 									   bool *all_frozen,
@@ -3554,8 +3554,8 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
 		tuple.t_len = ItemIdGetLength(itemid);
 		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
-										 buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+												  buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3574,8 +3574,7 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin,
-											   OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
 
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v8-0017-Inline-TransactionIdFollows-Precedes.patch (4.9K, 18-v8-0017-Inline-TransactionIdFollows-Precedes.patch)
  download | inline diff:
From 79e75d5fe9e40964ce2d479f8207c9e56749f41f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v8 17/22] Inline TransactionIdFollows/Precedes()

Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
---
 src/backend/access/transam/transam.c | 64 -------------------------
 src/include/access/transam.h         | 70 ++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
-	/*
-	 * If either ID is a permanent XID then we can just do unsigned
-	 * comparison.  If both are normal, do a modulo-2^32 comparison.
-	 */
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 < id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 <= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 > id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 >= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff >= 0);
-}
-
 
 /*
  * TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
 } TransamVariablesData;
 
 
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+	/*
+	 * If either ID is a permanent XID then we can just do unsigned
+	 * comparison.  If both are normal, do a modulo-2^32 comparison.
+	 */
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 < id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 <= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 > id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 >= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff >= 0);
+}
+
+
 /* ----------------
  *		extern declarations
  * ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
 extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
-- 
2.43.0



  [text/x-patch] v8-0018-Unset-all-visible-sooner-if-not-freezing.patch (2.5K, 19-v8-0018-Unset-all-visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 9c7bbbd397ba2b9b001ba2d0e8a7a52a79cc537b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v8 18/22] Unset all-visible sooner if not freezing

In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.

Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.

Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
 src/backend/access/heap/pruneheap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f083189fccc..dea8491adbb 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1493,8 +1493,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1752,8 +1755,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v8-0019-Allow-on-access-pruning-to-set-pages-all-visible.patch (26.8K, 20-v8-0019-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 6a5e12b22d2e2c18bea556598e1d3ddffc7830cb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v8 19/22] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c              | 15 ++++-
 src/backend/access/heap/heapam_handler.c      | 15 ++++-
 src/backend/access/heap/pruneheap.c           | 63 ++++++++++++++-----
 src/backend/access/index/indexam.c            | 46 ++++++++++++++
 src/backend/access/table/tableam.c            | 39 ++++++++++--
 src/backend/executor/execMain.c               |  4 ++
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  7 ++-
 src/backend/executor/nodeIndexscan.c          | 18 ++++--
 src/backend/executor/nodeSeqscan.c            | 24 +++++--
 src/include/access/genam.h                    | 11 ++++
 src/include/access/heapam.h                   | 24 ++++++-
 src/include/access/relscan.h                  |  6 ++
 src/include/access/tableam.h                  | 30 ++++++++-
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  4 +-
 16 files changed, 276 insertions(+), 38 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 88f880cfd15..d99160d5f82 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1247,6 +1250,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1285,6 +1289,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1317,6 +1327,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dea8491adbb..1669d7b466e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			int			options = 0;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				options = HEAP_PAGE_PRUNE_UPDATE_VM;
+			}
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * that during on-access pruning with the current implementation.
 			 */
 			heap_page_prune_and_freeze(relation, buffer, false,
-									   InvalidBuffer,
-									   vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+									   vmbuffer ? *vmbuffer : InvalidBuffer,
+									   vistest, options,
+									   NULL, &presult, PRUNE_ON_ACCESS,
+									   &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -513,12 +525,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * all-frozen for use in opportunistic freezing and to update the VM if
 	 * the caller requests it.
 	 *
-	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
-	 * other callers could do either one. The visibility bookkeeping is
-	 * required for opportunistic freezing (in addition to setting the VM
-	 * bits) because we only consider opportunistically freezing tuples if the
-	 * whole page would become all-frozen or if the whole page will be frozen
-	 * except for dead tuples that will be removed by vacuum.
+	 * Currently, only VACUUM attempts freezing. But other callers could. The
+	 * visibility bookkeeping is required for opportunistic freezing (in
+	 * addition to setting the VM bits) because we only consider
+	 * opportunistically freezing tuples if the whole page would become
+	 * all-frozen or if the whole page will be frozen except for dead tuples
+	 * that will be removed by vacuum. But if consider_update_vm is false,
+	 * we'll not set the VM even if the page is discovered to be all-visible.
+	 *
+	 * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+	 * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+	 * because we will not call heap_prepare_freeze_tuple() on each tuple.
 	 *
 	 * If only updating the VM, we must initialize all_frozen to false, as
 	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -530,7 +547,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * whether or not to freeze but before deciding whether or not to update
 	 * the VM so that we don't set the VM bit incorrectly.
 	 *
-	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * If not freezing and not updating the VM, we avoid the extra
 	 * bookkeeping. Initializing all_visible to false allows skipping the work
 	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
@@ -879,12 +896,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.all_frozen = false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate.consider_update_vm &&
+		prstate.all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate.consider_update_vm = false;
+		prstate.all_visible = prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * Handle setting visibility map bit based on information from the VM (if
+	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+	 * call), and from all_visible and all_frozen variables.
 	 */
 	if (prstate.consider_update_vm)
 	{
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b8b9d2a85f7..a862701edbe 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..0e986d8ef72 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -876,6 +878,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -913,10 +934,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, struct ScanKeyData *key)
+				   int nkeys, struct ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1125,6 +1149,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index de782014b2d..839c1be1d7c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -678,6 +678,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..870f03bdd79 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -10,6 +10,7 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Time::HiRes qw(usleep);
 use Test::More;
+use Time::HiRes qw(usleep);
 
 if ($ENV{enable_injection_points} ne 'yes')
 {
@@ -296,6 +297,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -745,7 +747,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v8-0020-Add-helper-functions-to-heap_page_prune_and_freez.patch (18.9K, 21-v8-0020-Add-helper-functions-to-heap_page_prune_and_freez.patch)
  download | inline diff:
From d55216a8a2fb16c176e245caca97e88ae35ad1f5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v8 20/22] Add helper functions to heap_page_prune_and_freeze

heap_page_prune_and_freeze() has gotten rather long. It has several
stages:

1) setup - where the PruneStateis set up
2) tuple examination -- where tuples and line pointers are examined to
   determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
   heuristics, and state gathered during stage 2 whether or not to
   freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged

This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.

For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
 src/backend/access/heap/pruneheap.c | 471 +++++++++++++++++-----------
 1 file changed, 295 insertions(+), 176 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1669d7b466e..8b898fe19dd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+								  bool do_prune,
+								  bool do_hint_full_or_prunable,
+								  bool did_tuple_hint_fpi,
+								  PruneState *prstate,
+								  bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+									 Buffer buffer, BlockNumber blockno, Page page,
+									 PruneReason reason,
+									 bool do_prune, bool do_freeze,
+									 bool blk_known_av,
+									 PruneState *prstate,
+									 Buffer *vmbuffer, uint8 *vmflags,
+									 bool *set_pd_all_visible);
+
 static bool identify_and_fix_vm_corruption(Relation relation,
 										   BlockNumber heap_blk,
 										   Buffer heap_buffer, Page heap_page,
@@ -376,6 +392,249 @@ identify_and_fix_vm_corruption(Relation relation,
 	return false;
 }
 
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+						 Buffer buffer, BlockNumber blockno, Page page,
+						 PruneReason reason,
+						 bool do_prune, bool do_freeze,
+						 bool blk_known_av,
+						 PruneState *prstate,
+						 Buffer *vmbuffer, uint8 *vmflags,
+						 bool *set_pd_all_visible)
+{
+	bool		do_set_vm = false;
+
+	/*
+	 * If the caller specified not to update the VM, validate everything is in
+	 * the right state and exit.
+	 */
+	if (!prstate->consider_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		/* We don't set only the page level visibility hint */
+		Assert(!(*set_pd_all_visible));
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->consider_update_vm &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate->consider_update_vm = false;
+		prstate->all_visible = prstate->all_frozen = false;
+	}
+
+	Assert(!prstate->all_frozen || prstate->all_visible);
+
+	/*
+	 * Clear any VM corruption. This does not need to be in a critical
+	 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+	 * may mark the heap page buffer dirty here and could end up doing so
+	 * again later. This is not a correctness issue and is in the path of VM
+	 * corruption, so we don't have to worry about the extra performance
+	 * overhead.
+	 */
+	if (identify_and_fix_vm_corruption(relation,
+									   blockno, buffer, page,
+									   blk_known_av, prstate->lpdead_items,
+									   *vmbuffer))
+	{
+		/* If we fix corruption, don't update the VM further */
+	}
+
+	/* Determine if we actually need to set the VM and which bits to set. */
+	else if (prstate->all_visible &&
+			 (!blk_known_av ||
+			  (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+	{
+		*vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+	}
+
+	do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	*set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+	return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool do_prune,
+					  bool do_hint_full_or_prunable,
+					  bool did_tuple_hint_fpi,
+					  PruneState *prstate,
+					  bool *all_frozen_except_lp_dead)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and exit.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		Assert(!(*all_frozen_except_lp_dead));
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_full_or_prunable)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	*all_frozen_except_lp_dead = prstate->all_frozen;
+	if (prstate->lpdead_items > 0)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
+
+	return do_freeze;
+}
+
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page. If the page's visibility status has changed, update it in
@@ -766,20 +1025,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Clear the offset information once we have processed the given page. */
 	*off_loc = InvalidOffsetNumber;
 
-	do_prune = prstate.nredirected > 0 ||
-		prstate.ndead > 0 ||
-		prstate.nunused > 0;
-
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
-	 * all-visible.
+	 * all-visible. This must be done before we decide whether or not to
+	 * opportunistically freeze below because we do not want to
+	 * opportunistically freeze the page if there are live tuples not visible
+	 * to everyone, which would prevent setting the page frozen in the VM.
 	 */
 	if (prstate.all_visible &&
 		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
+	/*
+	 * Now decide based on information collected while examining every tuple
+	 * which actions to take. If there are any prunable tuples, we'll prune
+	 * them. However, we will decide based on options specified by the caller
+	 * and various heuristics whether or not to freeze any tuples and whether
+	 * or not the page should be set all-visible/all-frozen in the VM.
+	 */
+	do_prune = prstate.nredirected > 0 ||
+		prstate.ndead > 0 ||
+		prstate.nunused > 0;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -790,182 +1059,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageIsFull(page);
 
 	/*
-	 * Decide if we want to go ahead with freezing according to the freeze
-	 * plans we prepared, or not.
-	 */
-	do_freeze = false;
-	if (prstate.attempt_freeze)
-	{
-		if (prstate.pagefrz.freeze_required)
-		{
-			/*
-			 * heap_prepare_freeze_tuple indicated that at least one XID/MXID
-			 * from before FreezeLimit/MultiXactCutoff is present.  Must
-			 * freeze to advance relfrozenxid/relminmxid.
-			 */
-			do_freeze = true;
-		}
-		else
-		{
-			/*
-			 * Opportunistically freeze the page if we are generating an FPI
-			 * anyway and if doing so means that we can set the page
-			 * all-frozen afterwards (might not happen until VACUUM's final
-			 * heap pass).
-			 *
-			 * XXX: Previously, we knew if pruning emitted an FPI by checking
-			 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze
-			 * and prune records were combined, this heuristic couldn't be
-			 * used anymore.  The opportunistic freeze heuristic must be
-			 * improved; however, for now, try to approximate the old logic.
-			 */
-			if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
-			{
-				/*
-				 * Freezing would make the page all-frozen.  Have already
-				 * emitted an FPI or will do so anyway?
-				 */
-				if (RelationNeedsWAL(relation))
-				{
-					if (did_tuple_hint_fpi)
-						do_freeze = true;
-					else if (do_prune)
-					{
-						if (XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-					else if (do_hint_full_or_prunable)
-					{
-						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
-	}
-	else if (prstate.nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate.pagefrz.freeze_required);
-
-		prstate.all_frozen = false;
-		prstate.nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page. It needs to reflect the present state of
-	 * the page when using it to determine whether or not to update the VM.
-	 *
-	 * Keep track of whether or not the page was all-frozen except LP_DEAD
-	 * items for the purposes of calculating the snapshot conflict horizon,
-	 * though.
+	 * We must decide whether or not to freeze before deciding if and what to
+	 * set in the VM.
 	 */
-	all_frozen_except_lp_dead = prstate.all_frozen;
-	if (prstate.lpdead_items > 0)
-	{
-		prstate.all_visible = false;
-		prstate.all_frozen = false;
-	}
-
-	/*
-	 * If this is an on-access call and we're not actually pruning, avoid
-	 * setting the visibility map if it would newly dirty the heap page or, if
-	 * the page is already dirty, if doing so would require including a
-	 * full-page image (FPI) of the heap page in the WAL. This situation
-	 * should be rare, as on-access pruning is only attempted when
-	 * pd_prune_xid is valid.
-	 */
-	if (reason == PRUNE_ON_ACCESS &&
-		prstate.consider_update_vm &&
-		prstate.all_visible &&
-		!do_prune && !do_freeze &&
-		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
-	{
-		prstate.consider_update_vm = false;
-		prstate.all_visible = prstate.all_frozen = false;
-	}
-
-	Assert(!prstate.all_frozen || prstate.all_visible);
-
-	/*
-	 * Handle setting visibility map bit based on information from the VM (if
-	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
-	 * call), and from all_visible and all_frozen variables.
-	 */
-	if (prstate.consider_update_vm)
-	{
-		/*
-		 * Clear any VM corruption. This does not need to be in a critical
-		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
-		 * we may mark the heap page buffer dirty here and could end up doing
-		 * so again later. This is not a correctness issue and is in the path
-		 * of VM corruption, so we don't have to worry about the extra
-		 * performance overhead.
-		 */
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av, prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
-
-		/* Determine if we actually need to set the VM and which bits to set. */
-		else if (prstate.all_visible &&
-				 (!blk_known_av ||
-				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
-			if (prstate.all_frozen)
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-	}
-
-	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
-
-	/*
-	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
-	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
-	 * set, we strongly prefer to keep them in sync.
-	 *
-	 * Prior to Postgres 19, it was possible for the page-level bit to be set
-	 * and the VM bit to be clear. This could happen if we crashed after
-	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
-	 */
-	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+	do_freeze = heap_page_will_freeze(relation, buffer,
+									  do_prune,
+									  do_hint_full_or_prunable,
+									  did_tuple_hint_fpi,
+									  &prstate,
+									  &all_frozen_except_lp_dead);
+
+	do_set_vm = heap_page_will_update_vm(relation,
+										 buffer, blockno, page,
+										 reason,
+										 do_prune, do_freeze,
+										 blk_known_av,
+										 &prstate,
+										 &vmbuffer,
+										 &vmflags, &set_pd_all_visible);
 
 	/* Save these for the caller in case we later zero out vmflags */
 	presult->new_vmbits = vmflags;
 
-	/* Any error while applying the changes is critical */
+	/*
+	 * Time to actually make the changes to the page and log them. Any error
+	 * while applying the changes is critical.
+	 */
 	START_CRIT_SECTION();
 
 	if (do_hint_full_or_prunable)
-- 
2.43.0



  [text/x-patch] v8-0021-Reorder-heap_page_prune_and_freeze-parameters.patch (5.8K, 22-v8-0021-Reorder-heap_page_prune_and_freeze-parameters.patch)
  download | inline diff:
From d68200de41024bd739177bca24cb51f3f37626b5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v8 21/22] Reorder heap_page_prune_and_freeze parameters

Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
 src/backend/access/heap/pruneheap.c  | 38 ++++++++++++++--------------
 src/backend/access/heap/vacuumlazy.c |  6 ++---
 src/include/access/heapam.h          |  4 +--
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8b898fe19dd..0a7a4ba0c0e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, false,
+			heap_page_prune_and_freeze(relation, buffer, options, false,
 									   vmbuffer ? *vmbuffer : InvalidBuffer,
-									   vistest, options,
-									   NULL, &presult, PRUNE_ON_ACCESS,
+									   vistest,
+									   NULL, PRUNE_ON_ACCESS, &presult,
 									   &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -645,6 +645,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
+ * options:
+ *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ *   pruning.
+ *
+ *   FREEZE indicates that we will also freeze tuples, and will return
+ *   'all_visible', 'all_frozen' flags to the caller.
+ *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
@@ -663,30 +672,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * contain the required block of the visibility map.
  *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- *   UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
  *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed.  It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -699,13 +699,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   int options,
 						   bool blk_known_av,
 						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
-						   int options,
 						   struct VacuumCutoffs *cutoffs,
-						   PruneFreezeResult *presult,
 						   PruneReason reason,
+						   PruneFreezeResult *presult,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 61c6b3d21ac..cead3ec84a4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1991,11 +1991,11 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf,
+	heap_page_prune_and_freeze(rel, buf, prune_options,
 							   all_visible_according_to_vm,
 							   vmbuffer,
-							   vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+							   vacrel->vistest,
+							   &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
 								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   int options,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
-									   int options,
 									   struct VacuumCutoffs *cutoffs,
-									   PruneFreezeResult *presult,
 									   PruneReason reason,
+									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
-- 
2.43.0



  [text/x-patch] v8-0022-Set-pd_prune_xid-on-insert.patch (6.5K, 23-v8-0022-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 5dcb61d6fba53255bbc3356afb90e575ecf7789d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v8 22/22] Set pd_prune_xid on insert

Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.

For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.

Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d99160d5f82..28da6a1a0fb 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2164,15 +2165,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2182,7 +2187,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2546,8 +2550,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 83a5f3dbc34..67256280d94 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -474,6 +474,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -623,9 +629,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-02 23:54  Andres Freund <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Andres Freund @ 2025-09-02 23:54 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Robert Haas <[email protected]>; Heikki Linnakangas <[email protected]>

Hi,

On 2025-09-02 19:11:01 -0400, Melanie Plageman wrote:
> From dd98177294011ee93cac122405516abd89f4e393 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 27 Aug 2025 08:50:15 -0400
> Subject: [PATCH v8 01/22] Remove unneeded VM pin from VM replay

LGTM.


> From 7c5cb3edf89735eaa8bee9ca46111bd6c554720b Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 27 Aug 2025 10:07:29 -0400
> Subject: [PATCH v8 02/22] Add assert and log message to visibilitymap_set

LGTM.


> From 07f31099754636ec9dabf6cca06c33c4b19c230c Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Tue, 17 Jun 2025 17:22:10 -0400
> Subject: [PATCH v8 03/22] Eliminate xl_heap_visible in COPY FREEZE
>
> Instead of emitting a separate WAL record for setting the VM bits in
> xl_heap_visible, specify the changes to make to the VM block in the
> xl_heap_multi_insert record instead.
>
> Author: Melanie Plageman <[email protected]>
> Reviewed-by: Kirill Reshke <[email protected]>
> Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com


> +		/*
> +		 * If we're only adding already frozen rows to a previously empty
> +		 * page, mark it as all-frozen and update the visibility map. We're
> +		 * already holding a pin on the vmbuffer.
> +		 */
>  		else if (all_frozen_set)
> +		{
> +			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
>  			PageSetAllVisible(page);
> +			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
> +			visibilitymap_set_vmbyte(relation,
> +									 BufferGetBlockNumber(buffer),
> +									 vmbuffer,
> +									 VISIBILITYMAP_ALL_VISIBLE |
> +									 VISIBILITYMAP_ALL_FROZEN);
> +		}



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-03 09:06  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Kirill Reshke @ 2025-09-03 09:06 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, 3 Sept 2025 at 04:11, Melanie Plageman
<[email protected]> wrote:
>
> On Tue, Sep 2, 2025 at 5:52 PM Melanie Plageman
> <[email protected]> wrote:
> >
> > On Thu, Aug 28, 2025 at 5:12 AM Kirill Reshke <[email protected]> wrote:
> > >
> > > I did micro git-blame research here. I spotted only one related change
> > > [0]. Looks like before this change pin was indeed needed.
> > > But not after this change, so this visibilitymap_pin is just an oversight?
> > > Related thread is [1]. I quickly checked the discussion in this
> > > thread, and it looks like no one was bothered about these lines or VM
> > > logging changes (in this exact pin buffer aspect). The discussion was
> > > of other aspects of this commit.
> >
> > Wow, thanks so much for doing that research. Looking at it myself, it
> > does indeed seem like just an oversight. It isn't harmful since it
> > won't take another pin, but it is confusing, so I think we should at
> > least remove it in master. I'm not as sure about back branches.
>
> I've updated the commit message in the patch set to reflect the
> research you did in attached v8.
>
> - Melanie



Hi!

small comments regarding new series

0001, 0002, 0017 LGTM


In 0015:

```
reshke@yezzey-cbdb-bench:~/postgres$ git diff
src/backend/access/heap/pruneheap.c
diff --git a/src/backend/access/heap/pruneheap.c
b/src/backend/access/heap/pruneheap.c
index 05b51bd8d25..0794af9ae89 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1398,7 +1398,7 @@ heap_prune_record_unchanged_lp_normal(Page page,
PruneState *prstate, OffsetNumb
                                /*
                                 * For now always use prstate->cutoffs
for this test, because
                                 * we only update 'all_visible' when
freezing is requested. We
-                                * could use
GlobalVisTestIsRemovableXid instead, if a
+                                * could use GlobalVisXidVisibleToAll
instead, if a
                                 * non-freezing caller wanted to set the VM bit.
                                 */
                                Assert(prstate->cutoffs);
```

Also, maybe GlobalVisXidTestAllVisible is a slightly better name? (The
term 'all-visible' is one that we occasionally utilize)


-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-05 22:20  Melanie Plageman <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 2 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-09-05 22:20 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Robert Haas <[email protected]>; Heikki Linnakangas <[email protected]>

Thanks for the review!

On Tue, Sep 2, 2025 at 7:54 PM Andres Freund <[email protected]> wrote:
>
> On 2025-09-02 19:11:01 -0400, Melanie Plageman wrote:
> > From dd98177294011ee93cac122405516abd89f4e393 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <[email protected]>
> > Date: Wed, 27 Aug 2025 08:50:15 -0400
> > Subject: [PATCH v8 01/22] Remove unneeded VM pin from VM replay

I didn't push it yet because I did a new version that actually
eliminates the asserts in heap_multi_insert() before calling
visibilitymap_set() -- since they are redundant with checks inside
visibilitymap_set(). 0001 of attached v9 is what I plan to push,
barring any objections.

> > From 7c5cb3edf89735eaa8bee9ca46111bd6c554720b Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <[email protected]>
> > Date: Wed, 27 Aug 2025 10:07:29 -0400
> > Subject: [PATCH v8 02/22] Add assert and log message to visibilitymap_set

I pushed this.

> From an abstraction POV I don't love that heapam now is responsible for
> acquiring and releasing the lock. But that ship already kind of has sailed, as
> heapam.c is already responsible for releasing the vm buffer etc...
>
> I've wondered about splitting the responsibilities up into multiple
> visibilitymap_set_* functions, so that heapam.c wouldn't need to acquire the
> lock and set the LSN. But it's probably not worth it.

Yea, I explored heap wrappers coupling heap operations related to
setting the VM along with the VM updates [1], but the results weren't
appealing. Setting the heap LSN and marking the heap buffer dirty and
such happens in a different place in different callers because it is
happening as part of the operations that actually end up rendering the
page all-visible.

And a VM-only helper would literally just acquire and release the lock
and set the LSN on the vm page -- which I don't think is worth it.

> > +     /*
> > +      * Now read and update the VM block. Even if we skipped updating the heap
> > +      * page due to the file being dropped or truncated later in recovery, it's
> > +      * still safe to update the visibility map.  Any WAL record that clears
> > +      * the visibility map bit does so before checking the page LSN, so any
> > +      * bits that need to be cleared will still be cleared.
> > +      *
> > +      * It is only okay to set the VM bits without holding the heap page lock
> > +      * because we can expect no other writers of this page.
> > +      */
> > +     if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
> > +             XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
> > +                                                                       &vmbuffer) == BLK_NEEDS_REDO)
> > +     {
> > +             Relation        reln = CreateFakeRelcacheEntry(rlocator);
> > +
> > +             Assert(visibilitymap_pin_ok(blkno, vmbuffer));
> > +             visibilitymap_set_vmbyte(reln, blkno,
> > +                                                              vmbuffer,
> > +                                                              VISIBILITYMAP_ALL_VISIBLE |
> > +                                                              VISIBILITYMAP_ALL_FROZEN);
> > +
> > +             /*
> > +              * It is not possible that the VM was already set for this heap page,
> > +              * so the vmbuffer must have been modified and marked dirty.
> > +              */
> > +             Assert(BufferIsDirty(vmbuffer));
>
> How about making visibilitymap_set_vmbyte() return whether it needed to do
> something? This seems somewhat indirect...

It does return the state of the previous bits. But, I am specifically
asserting that the buffer is dirty because I am about to set the page
LSN. So I don't just care that changes were made, I care that we
remembered to mark the buffer dirty.

> I think it might be good to encapsulate this code into a helper in
> visibilitymap.c, there will be more callers in the subsequent patches.

By the end of the set, the different callers have different
expectations (some don't expect the buffer to have been dirtied
necessarily) and where they do the various related operations is
spread out depending on the caller. I just couldn't come up with a
helper solution I liked.

That being said, I definitely don't think it's needed for this patch
(logging setting the VM in xl_heap_multi_insert()).

> > +uint8
> > +visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
> > +                                              Buffer vmBuf, uint8 flags)
>
> Why is it named vmbyte? This actually just sets the two bits corresponding to
> the buffer, not the entire byte. So it seems somewhat misleading to reference
> byte.

Renamed it to visibilitymap_set_vmbits.

> > Instead of emitting a separate xl_heap_visible record for each page that
> > is rendered all-visible by vacuum's third phase, include the updates to
> > the VM in the already emitted xl_heap_prune record.
>
> Reading through the change I didn't particularly like that there's another
> optional field in xl_heap_prune, as it seemed liked something that should be
> encoded in flags.  Of course there aren't enough flag bits available.  But
> that made me look at the rest of the record: Uh, what do we use the reason
> field for?  As far as I can tell f83d709760d8 added it without introducing any
> users? It doesn't even seem to be set.

yikes, you are right about the "reason" member. Attached 0002 removes
it, and I'll go ahead and fix it in the back branches too. I can't
fathom how that slipped through the cracks. We do pass the PruneReason
for setting the rmgr info about what type of record it is (i.e. if it
is one emitted by vacuum phase I, phase III, or on-access pruning).
But we don't need or use a separate member.. I went back and tried to
figure out what the rationale was, but I couldn't find anything.

As for the VM flags being an optional unaligned member -- in v9, I've
expanded the flags member to a uint16 to make room for the extra
flags. Seems we've been surviving with using up 2 bytes this long.

> > @@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
> >                  (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
> >
> >       /*
> > -      * We are about to remove and/or freeze tuples.  In Hot Standby mode,
> > -      * ensure that there are no queries running for which the removed tuples
> > -      * are still visible or which still consider the frozen xids as running.
> > -      * The conflict horizon XID comes after xl_heap_prune.
> > +      * After xl_heap_prune is the optional snapshot conflict horizon.
> > +      *
> > +      * In Hot Standby mode, we must ensure that there are no running queries
> > +      * which would conflict with the changes in this record. If pruning, that
> > +      * means we cannot remove tuples still visible to transactions on the
> > +      * standby. If freezing, that means we cannot freeze tuples with xids that
> > +      * are still considered running on the standby. And for setting the VM, we
> > +      * cannot do so if the page isn't all-visible to all transactions on the
> > +      * standby.
> >        */
>
> I'm a bit confused by this new comment - it sounds like we're deciding whether
> to remove tuple versions, but that decision has long been made, no?

Well, the comment is a revision of a comment that was already there on
essentially why replaying this record could cause recovery conflicts.
It mentioned pruning and freezing, so I expanded it to mention setting
the VM. Taking into account your confusion, I tried rewording it in
attached v9.

> > +     if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
> > +                                                                                        vacrel->cutoffs.OldestXmin,
> > +                                                                                        deadoffsets, num_offsets,
> > +                                                                                        &all_frozen, &visibility_cutoff_xid,
> > +                                                                                        &vacrel->offnum))
>
> I am rather confused - we never can set all-visible if there are any LP_DEAD
> items left. If the idea is that we are removing the LP_DEAD items in
> lazy_vacuum_heap_page() - what guarantees that all LP_DEAD items are being
> removed? Couldn't some tuples get marked LP_DEAD by on-access pruning, after
> vacuum visited the page and collected dead items?
>
> Ugh, I see - it works because we pass in the set of dead items.  I think that
> makes the name *really* misleading, it's not except LP_DEAD, it's except the
> offsets passed in, no?
>
> But then you actually check that the set of dead items didn't change - what
> guarantees that?

So, I pass in the deadoffsets we got from the TIDStore. If the only
dead items on the page are exactly those dead items, then the page
will be all-visible as soon as we set those LP_UNUSED -- which we do
unconditionally. And we have the lock on the page, so no one can
on-access prune and make new dead items while we are in
lazy_vacuum_heap_page().

Given your confusion, I've refactored this and used a different
approach -- I explicitly check the passed-in deadoffsets array when I
encounter a dead item and see if it is there. That should hopefully
make it more clear.

> I didn't look at the later patches, except that I did notice this:
<--snip-->
> Why are we manually pinning the vm buffer here? Shouldn't the xlog machinery
> have done so, as you noticed in one of the early on patches?

Fixed. Thanks!

- Melanie

[1] [1] https://www.postgresql.org/message-id/flat/CAAKRu_Yj%3DyrL%2BgGGsqfYVQcYn7rDp6hDeoF1vN453JDp8dEY%2Bw...


Attachments:

  [text/x-patch] v9-0002-Remove-unused-xl_heap_prune-member-reason.patch (1.1K, 2-v9-0002-Remove-unused-xl_heap_prune-member-reason.patch)
  download | inline diff:
From df9b87d0a1a973c0c655f5ba858485795ff98951 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Sep 2025 15:02:58 -0400
Subject: [PATCH v9 02/22] Remove unused xl_heap_prune member, reason

f83d709760d8 refactored xl_heap_prune and added an unused member,
reason. While PruneReason is used when constructing this WAL record to
set the WAL record definition, it doesn't need to be stored in a
separate field in the record. Remove it.

Author: Melanie Plageman <[email protected]>
Reported-by: Andres Freund <[email protected]>

Discussion: https://postgr.es/m/tvvtfoxz5ykpsctxjbzxg3nldnzfc7geplrt2z2s54pmgto27y%40hbijsndifu45
---
 src/include/access/heapam_xlog.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..d4c0625b632 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -284,7 +284,6 @@ typedef struct xl_heap_update
  */
 typedef struct xl_heap_prune
 {
-	uint8		reason;
 	uint8		flags;
 
 	/*
-- 
2.43.0



  [text/x-patch] v9-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch (28.4K, 3-v9-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch)
  download | inline diff:
From 81b134346c1a981382d1eb915472aa3f26bb3586 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v9 05/22] Eliminate xl_heap_visible from vacuum phase III

Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.

The visibilitymap bits are stored in the flags member of the
xl_heap_prune record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c  | 145 ++++++++++++++++++----
 src/backend/access/heap/pruneheap.c    |  66 ++++++++--
 src/backend/access/heap/vacuumlazy.c   | 164 +++++++++++++++++--------
 src/backend/access/rmgrdesc/heapdesc.c |   7 +-
 src/include/access/heapam.h            |   9 ++
 src/include/access/heapam_xlog.h       |  36 ++++--
 6 files changed, 330 insertions(+), 97 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 0820f7d052d..11c11929ed9 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
+	Size		freespace = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,17 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
+	vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. That means we
+	 * can't replay this record if it removes tuples that are still visible to
+	 * transactions on the standby, freeze tuples with xids that are still
+	 * considered running on the standby, or set a page as all-visible in the
+	 * VM if it isn't all-visible to all transactions on the standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -71,12 +79,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	}
 
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -89,6 +97,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Size		datalen;
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
+		bool		do_prune;
+		bool		mark_buffer_dirty;
+		bool		set_heap_lsn;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +108,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+		set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 ||
+			   vmflags & VISIBILITYMAP_VALID_BITS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,36 +156,117 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		/*
+		 * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+		 * also going to set bits in the VM later.
+		 *
+		 * We must never end up with the VM bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+		 * modification would fail to clear the VM bit.
+		 */
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+
+			/*
+			 * If the only change to the heap page is setting PD_ALL_VISIBLE,
+			 * we can avoid setting the page LSN unless checksums or
+			 * wal_log_hints are enabled.
+			 */
+			set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+			mark_buffer_dirty = true;
+		}
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
 
-		PageSetLSN(page, lsn);
-		MarkBufferDirty(buffer);
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+		if (set_heap_lsn)
+			PageSetLSN(page, lsn);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or will be setting a page in
+	 * the visibility map, measure the page's freespace to later update the
+	 * freespace map.
+	 *
+	 * Even if we are just updating the VM (and thus not freeing up any
+	 * space), we'll still update the FSM for this page. Since FSM is not
+	 * WAL-logged and only updated heuristically, it easily becomes stale in
+	 * standbys.  If the standby is later promoted and runs VACUUM, it will
+	 * skip updating individual free space figures for pages that became
+	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
+	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+	 * space values to upper FSM layers; later inserters try to use such pages
+	 * only to find out that they are unusable.  This can cause long stalls
+	 * when there are many such pages.
+	 *
+	 * Forestall those problems by updating FSM's idea about a page that is
+	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
+	 *
+	 * We want to avoid holding an exclusive lock on the heap buffer while
+	 * doing IO (either of the FSM or the VM), so we'll release the lock on
+	 * the heap buffer before doing either.
 	 */
 	if (BufferIsValid(buffer))
 	{
-		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
-						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
-		{
-			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+		if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+							XLHP_HAS_DEAD_ITEMS |
+							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+			vmflags & VISIBILITYMAP_VALID_BITS)
+			freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+
+		UnlockReleaseBuffer(buffer);
+	}
+
+	/*
+	 * Read and update the VM block. Even if we skipped updating the heap page
+	 * due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * Note that it is _only_ okay that we do not hold a lock on the heap page
+	 * because we are in recovery and can expect no other writers to clear
+	 * PD_ALL_VISIBLE before we are able to update the VM.
+	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		uint8		old_vmbits = 0;
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
-			UnlockReleaseBuffer(buffer);
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
 
-			XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+		{
+			Assert(BufferIsDirty(vmbuffer));
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		}
-		else
-			UnlockReleaseBuffer(buffer);
+
+		FreeFakeRelcacheEntry(reln);
 	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
+	if (freespace > 0)
+		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7ebd22f00a3..f0b33d1b696 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -2030,14 +2032,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phase III, the heap page may be marked
+ *   all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a singel WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2045,12 +2051,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer,
+						  uint8 vmflags,
+						  bool set_pd_all_vis,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
 						  PruneReason reason,
@@ -2062,6 +2079,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2088,21 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
+	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+	xlrec.flags = vmflags;
 
-	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2079,7 +2110,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2168,5 +2203,22 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+	{
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+	}
+
+	/*
+	 * If pruning or freezing tuples or setting the page all-visible when
+	 * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+	 * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+	 * but this is deemed okay for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+	{
+		Assert(BufferIsDirty(buffer));
+		PageSetLSN(BufferGetPage(buffer), recptr);
+	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7f6f684bc63..a50652ca5a0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2846,8 +2848,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
+	bool		set_pd_all_vis = false;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2858,6 +2863,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+									   vacrel->cutoffs.OldestXmin,
+									   deadoffsets, num_offsets,
+									   &all_frozen, &visibility_cutoff_xid,
+									   &vacrel->offnum))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2877,6 +2896,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
+		Assert(!PageIsAllVisible(page));
+		set_pd_all_vis = true;
+		PageSetAllVisible(page);
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		visibilitymap_set_vmbits(vacrel->rel,
+								 blkno,
+								 vmbuffer, vmflags);
+		conflict_xid = visibility_cutoff_xid;
+	}
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2886,7 +2917,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  vmbuffer,
+								  vmflags,
+								  set_pd_all_vis,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
@@ -2895,39 +2929,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
-	 */
 	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
-	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
-								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  flags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
@@ -3594,40 +3601,85 @@ dead_items_cleanup(LVRelState *vacrel)
 }
 
 /*
- * Check if every tuple in the given page in buf is visible to all current and
- * future transactions.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+
+/*
+ * Determines whether or not the heap page in buf is all-visible other than
+ * the dead line pointers referred to by the provided deadoffsets array.
  *
- * OldestXmin is used to determine visibility.
+ * deadoffsets are the offsets the caller knows about and already removed
+ * associated index entries. Vacuum will call this before setting those line
+ * pointers LP_UNUSED. So, if there are no new LP_DEAD items, then the page
+ * can be set all-visible in the VM by the caller.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
  *
- * Sets *all_frozen to true if every tuple on this page is frozen.
+ * OldestXmin is used to determine visibility.
  *
- * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
- * It is only valid if the page is all-visible.
+ * *all_frozen is an output parameter indicating to the caller if every tuple
+ * on the page is frozen.
  *
  * *logging_offnum will have the OffsetNumber of the current tuple being
  * processed for vacuum's error callback system.
  *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
+ * visible tuples. It is only valid if the page is all-visible.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+							   TransactionId OldestXmin,
+							   OffsetNumber *deadoffsets,
+							   int ndeadoffsets,
+							   bool *all_frozen,
+							   TransactionId *visibility_cutoff_xid,
+							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	int			matched_dead_count = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
 
+	Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+	/* Confirm input deadoffsets[] is strictly sorted */
+	if (ndeadoffsets > 1)
+	{
+		for (int i = 1; i < ndeadoffsets; i++)
+			Assert(deadoffsets[i - 1] < deadoffsets[i]);
+	}
+#endif
+
 	maxoff = PageGetMaxOffsetNumber(page);
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff && all_visible;
@@ -3655,9 +3707,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			if (!deadoffsets ||
+				matched_dead_count >= ndeadoffsets ||
+				deadoffsets[matched_dead_count] != offnum)
+			{
+				*all_frozen = all_visible = false;
+				break;
+			}
+			matched_dead_count++;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..c95d30dfe8d 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
  * code, the latter of which is used in frontend (pg_waldump) code.
  */
 void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 									   int *nplans, xlhp_freeze_plan **plans,
 									   OffsetNumber **frz_offsets,
 									   int *nredirected, OffsetNumber **redirected,
@@ -279,7 +279,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			TransactionId conflict_xid;
 
 			memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
-
 			appendStringInfo(buf, "snapshotConflictHorizon: %u",
 							 conflict_xid);
 		}
@@ -287,6 +286,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 xlrec->flags & VISIBILITYMAP_VALID_BITS);
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
 										   Buffer buffer);
 extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer,
+									  uint8 vmflags,
+									  bool vm_modified_heap_page,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
 									  PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..d8508593e7c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
  * Main data section:
  *
  *	xl_heap_prune
- *		uint8				flags
+ *		uint16				flags
  *	TransactionId			snapshot_conflict_horizon
  *
  * Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
  */
 typedef struct xl_heap_prune
 {
-	uint8		flags;
+	uint16		flags;
 
 	/*
 	 * If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,10 +292,22 @@ typedef struct xl_heap_prune
 	 */
 } xl_heap_prune;
 
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
+
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set. As
+ * such, (1 << 0) and (1 << 1) are reserved for VISIBILITYMAP_ALL_VISIBLE and
+ * VISIBILITYMAP_ALL_FROZEN.
+ */
 
-/* to handle recovery conflict during logical decoding on standby */
-#define		XLHP_IS_CATALOG_REL			(1 << 1)
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
+#define		XLHP_IS_CATALOG_REL			(1 << 2)
 
 /*
  * Does replaying the record require a cleanup-lock?
@@ -305,7 +317,7 @@ typedef struct xl_heap_prune
  * marks LP_DEAD line pointers as unused without moving any tuple data, an
  * ordinary exclusive lock is sufficient.
  */
-#define		XLHP_CLEANUP_LOCK	       (1 << 2)
+#define		XLHP_CLEANUP_LOCK	       (1 << 3)
 
 /*
  * If we remove or freeze any entries that contain xids, we need to include a
@@ -313,22 +325,22 @@ typedef struct xl_heap_prune
  * there are no queries running for which the removed tuples are still
  * visible, or which still consider the frozen XIDs as running.
  */
-#define		XLHP_HAS_CONFLICT_HORIZON   (1 << 3)
+#define		XLHP_HAS_CONFLICT_HORIZON   (1 << 4)
 
 /*
  * Indicates that an xlhp_freeze_plans sub-record and one or more
  * xlhp_freeze_plan sub-records are present.
  */
-#define		XLHP_HAS_FREEZE_PLANS		(1 << 4)
+#define		XLHP_HAS_FREEZE_PLANS		(1 << 5)
 
 /*
  * XLHP_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, and XLHP_HAS_NOW_UNUSED_ITEMS
  * indicate that xlhp_prune_items sub-records with redirected, dead, and
  * unused item offsets are present.
  */
-#define		XLHP_HAS_REDIRECTIONS		(1 << 5)
-#define		XLHP_HAS_DEAD_ITEMS	        (1 << 6)
-#define		XLHP_HAS_NOW_UNUSED_ITEMS   (1 << 7)
+#define		XLHP_HAS_REDIRECTIONS		(1 << 6)
+#define		XLHP_HAS_DEAD_ITEMS	        (1 << 7)
+#define		XLHP_HAS_NOW_UNUSED_ITEMS   (1 << 8)
 
 /*
  * xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
@@ -497,7 +509,7 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
 								   uint8 vmflags);
 
 /* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
 												   OffsetNumber **frz_offsets,
 												   int *nredirected, OffsetNumber **redirected,
-- 
2.43.0



  [text/x-patch] v9-0001-Remove-unneeded-VM-pin-from-VM-replay.patch (2.5K, 4-v9-0001-Remove-unneeded-VM-pin-from-VM-replay.patch)
  download | inline diff:
From 686edbfbe6556da8cdd6219fd9cd270ccfc9bb32 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 27 Aug 2025 08:50:15 -0400
Subject: [PATCH v9 01/22] Remove unneeded VM pin from VM replay

Previously, heap_xlog_visible() called visibilitymap_pin() even after
getting a buffer from XLogReadBufferForRedoExtended() -- which returns a
pinned buffer containing the specified block of the visibility map.

This would just have resulted in visibilitymap_pin() returning early
since the specified page was already present and pinned, but it was
confusing extraneous code, so remove it.

It appears to be an oversight in 2c03216.

While we are at it, remove two VM-related redundant asserts in the COPY
FREEZE code path. visibilitymap_set() already asserts that
PD_ALL_VISIBLE is set on the heap page and checks that the vmbuffer
contains the bits corresponding to the specified heap block, so callers
do not also need to check this.

Author: Melanie Plageman <[email protected]>
Reported-by: Melanie Plageman <[email protected]>
Reported-by: Kirill Reshke <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Andres Freund <[email protected]>

Discussion: https://postgr.es/m/CALdSSPhu7WZd%2BEfQDha1nz%3DDC93OtY1%3DUFEdWwSZsASka_2eRQ%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 3 ---
 src/backend/access/heap/heapam_xlog.c | 1 -
 2 files changed, 4 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e3e7307ef5f..4c5ae205a7a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2647,9 +2647,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		 */
 		if (all_frozen_set)
 		{
-			Assert(PageIsAllVisible(page));
-			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
 			/*
 			 * It's fine to use InvalidTransactionId here - this is only used
 			 * when HEAP_INSERT_FROZEN is specified, which intentionally
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5d48f071f53..cf843277938 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -295,7 +295,6 @@ heap_xlog_visible(XLogReaderState *record)
 		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		reln = CreateFakeRelcacheEntry(rlocator);
-		visibilitymap_pin(reln, blkno, &vmbuffer);
 
 		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
 						  xlrec->snapshotConflictHorizon, vmbits);
-- 
2.43.0



  [text/x-patch] v9-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch (11.3K, 5-v9-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch)
  download | inline diff:
From 7b6222f1670a0078c32383e64fb3782f555a6564 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v9 03/22] Eliminate xl_heap_visible in COPY FREEZE

Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c        | 47 ++++++++++-------
 src/backend/access/heap/heapam_xlog.c   | 43 +++++++++++++++-
 src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c  |  5 ++
 src/include/access/visibilitymap.h      |  2 +
 5 files changed, 144 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4c5ae205a7a..893a739009a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2504,9 +2504,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * If the page is all visible, need to clear that, unless we're only
 		 * going to add further frozen rows to it.
-		 *
-		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2516,8 +2513,22 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+
+		/*
+		 * If we're only adding already frozen rows to a previously empty
+		 * page, mark it as all-frozen and update the visibility map. We're
+		 * already holding a pin on the vmbuffer.
+		 */
 		else if (all_frozen_set)
+		{
 			PageSetAllVisible(page);
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			visibilitymap_set_vmbits(relation,
+									 BufferGetBlockNumber(buffer),
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN);
+		}
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2576,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2627,7 +2644,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2637,26 +2657,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+			{
+				Assert(BufferIsDirty(vmbuffer));
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
+			}
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
-		 */
 		if (all_frozen_set)
-		{
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..0820f7d052d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -662,6 +663,46 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Now read and update the VM block. Even if we skipped updating the heap
+	 * page due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * It is only okay to set the VM bits without holding the heap page lock
+	 * because we can expect no other writers of this page.
+	 */
+	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		visibilitymap_set_vmbits(reln, blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN);
+
+		/*
+		 * It is not possible that the VM was already set for this heap page,
+		 * so the vmbuffer must have been modified and marked dirty.
+		 */
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..bb8dfd8910a 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
+ *      visibilitymap_set_vmbits - set bit(s) in a pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+		 flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+	/* Call in same critical section where WAL is emitted. */
+	Assert(InRecovery || CritSectionCount > 0);
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	Assert(BufferIsExclusiveLocked(vmBuf));
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..fc7056a91ea 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v9-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patch (5.4K, 6-v9-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patch)
  download | inline diff:
From abd46a0e574456401cb34380236673239c317361 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v9 04/22] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 48 ++++++++++++++++++----------
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 932701d8420..7f6f684bc63 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,8 +463,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2009,8 +2012,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2906,8 +2910,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * emitted.
 	 */
 	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
@@ -3590,10 +3594,18 @@ dead_items_cleanup(LVRelState *vacrel)
 }
 
 /*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * Check if every tuple in the given page in buf is visible to all current and
+ * future transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * Sets *all_frozen to true if every tuple on this page is frozen.
+ *
+ * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
+ * It is only valid if the page is all-visible.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
  *
  * This is a stripped down version of lazy_scan_prune().  If you change
  * anything here, make sure that everything stays in sync.  Note that an
@@ -3601,9 +3613,11 @@ dead_items_cleanup(LVRelState *vacrel)
  * introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3626,7 +3640,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3650,9 +3664,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
 										 buf))
 		{
 			case HEAPTUPLE_LIVE:
@@ -3673,7 +3687,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+											   OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3708,7 +3722,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
-- 
2.43.0



  [text/x-patch] v9-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patch (5.8K, 7-v9-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patch)
  download | inline diff:
From 15eb77d2b54d4856d6dd392c48cb68d6721d20ff Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v9 06/22] Use xl_heap_prune record for setting empty pages
 all-visible

As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
 src/backend/access/heap/pruneheap.c  | 14 +++++--
 src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++----------
 src/include/access/heapam.h          |  1 +
 3 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f0b33d1b696..373986b204a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  false,
 									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
@@ -2055,6 +2056,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
  * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
  * the page LSN when checksums/wal_log_hints are enabled even if we did not
  * prune or freeze tuples on the page.
@@ -2065,6 +2069,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  bool force_heap_fpi,
 						  Buffer vmbuffer,
 						  uint8 vmflags,
 						  bool set_pd_all_vis,
@@ -2095,13 +2100,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 
 	regbuf_flags = REGBUF_STANDARD;
 
+	if (force_heap_fpi)
+		regbuf_flags |= REGBUF_FORCE_IMAGE;
+
 	/*
 	 * We can avoid an FPI if the only modification we are making to the heap
 	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+	else if (!do_prune &&
+			 nfrozen == 0 &&
+			 (!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a50652ca5a0..edd28123b7d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1877,33 +1877,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+				VISIBILITYMAP_ALL_FROZEN;
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
-			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				PageGetLSN(page) == InvalidXLogRecPtr)
-				log_newpage_buffer(buf, true);
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			visibilitymap_set_vmbits(vacrel->rel, blkno,
+									 vmbuffer, new_vmbits);
+
+			if (RelationNeedsWAL(vacrel->rel))
+			{
+				/*
+				 * It's possible that another backend has extended the heap,
+				 * initialized the page, and then failed to WAL-log the page
+				 * due to an ERROR.  Since heap extension is not WAL-logged,
+				 * recovery might try to replay our record setting the page
+				 * all-visible and find that the page isn't initialized, which
+				 * will cause a PANIC. To prevent that, if the page hasn't
+				 * been previously WAL-logged, force a heap FPI.
+				 */
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  PageGetLSN(page) == InvalidXLogRecPtr,
+										  vmbuffer,
+										  new_vmbits,
+										  true,
+										  InvalidTransactionId,
+										  false, PRUNE_VACUUM_SCAN,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+			}
 
-			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
-			/* Count the newly all-frozen pages for logging */
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+			/* Count the newly all-frozen pages for logging. */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
 		}
@@ -2917,6 +2931,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
+								  false,
 								  vmbuffer,
 								  vmflags,
 								  set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  bool force_heap_fpi,
 									  Buffer vmbuffer,
 									  uint8 vmflags,
 									  bool vm_modified_heap_page,
-- 
2.43.0



  [text/x-patch] v9-0007-Combine-lazy_scan_prune-VM-corruption-cases.patch (7.1K, 8-v9-0007-Combine-lazy_scan_prune-VM-corruption-cases.patch)
  download | inline diff:
From c711696d07304ca3130a56dd9b068779c74e5ec2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v9 07/22] Combine lazy_scan_prune VM corruption cases

lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.

Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.

This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
 src/backend/access/heap/vacuumlazy.c | 114 +++++++++++++++++----------
 1 file changed, 73 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index edd28123b7d..1474835c74b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,6 +430,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1932,6 +1938,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
+
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2078,9 +2144,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * all_frozen variables. Start by looking for any VM corruption.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	{
+		/* Don't update the VM if we just cleared corruption in it */
+	}
+	else if (!all_visible_according_to_vm && presult.all_visible)
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2132,45 +2203,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-- 
2.43.0



  [text/x-patch] v9-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch (11.7K, 9-v9-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch)
  download | inline diff:
From 1b86b5724fc3468457f1e2d5d57df4c708080164 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v9 09/22] Find and fix VM corruption in
 heap_page_prune_and_freeze

Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
 src/backend/access/heap/pruneheap.c  | 87 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 77 +++---------------------
 src/include/access/heapam.h          |  4 ++
 3 files changed, 96 insertions(+), 72 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 373986b204a..5c08a5d44c7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
+
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+			heap_page_prune_and_freeze(relation, buffer, false,
+									   InvalidBuffer,
+									   vistest, 0,
 									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 RelationGetRelationName(relation), heap_blk);
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
  * (see heap_prune_satisfies_vacuum).
  *
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   bool blk_known_av,
+						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
 						   int options,
 						   struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	/*
+	 * Clear any VM corruption. This does not need to be done in a critical
+	 * section.
+	 */
+	presult->vm_corruption = false;
+	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+																blockno, buffer, page,
+																blk_known_av,
+																prstate.lpdead_items, vmbuffer);
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cbe37369790..d49c71bc1b5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
-										   BlockNumber heap_blk,
-										   Buffer heap_buffer, Page heap_page,
-										   bool heap_blk_known_av,
-										   int64 nlpdead_items,
-										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1938,65 +1932,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
-							   BlockNumber heap_blk,
-							   Buffer heap_buffer, Page heap_page,
-							   bool heap_blk_known_av,
-							   int64 nlpdead_items,
-							   Buffer vmbuffer)
-{
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
-		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 RelationGetRelationName(relation), heap_blk);
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	return false;
-}
-
 
 /* qsort comparator for sorting OffsetNumbers */
 static int
@@ -2055,11 +1990,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+	heap_page_prune_and_freeze(rel, buf,
+							   all_visible_according_to_vm,
+							   vmbuffer,
+							   vacrel->vistest, prune_options,
 							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2144,10 +2082,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables. Start by looking for any VM corruption.
+	 * all_frozen variables.
 	 */
-	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
-									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	if (presult.vm_corruption)
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
+	bool		vm_corruption;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   bool blk_known_av,
+									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
 									   int options,
 									   struct VacuumCutoffs *cutoffs,
-- 
2.43.0



  [text/x-patch] v9-0008-Combine-vacuum-phase-I-VM-update-cases.patch (5.8K, 10-v9-0008-Combine-vacuum-phase-I-VM-update-cases.patch)
  download | inline diff:
From 50ca8c73a62531f8d1b30886551c492023ea9e47 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v9 08/22] Combine vacuum phase I VM update cases

We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.

Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.

The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
 src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
 1 file changed, 32 insertions(+), 69 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1474835c74b..cbe37369790 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2151,11 +2151,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
-	else if (!all_visible_according_to_vm && presult.all_visible)
+
+	/*
+	 * If the page isn't yet marked all-visible in the VM or it is and needs
+	 * to me marked all-frozen, update the VM Note that all_frozen is only
+	 * valid if all_visible is true, so we must check both all_visible and
+	 * all_frozen.
+	 */
+	else if (presult.all_visible &&
+			 (!all_visible_according_to_vm ||
+			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
+		/*
+		 * If the page is all-frozen, we can pass InvalidTransactionId as our
+		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+		 * everything safe for REDO was logged when the page's tuples were
+		 * frozen.
+		 */
 		if (presult.all_frozen)
 		{
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2168,21 +2183,29 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * If the heap page is all-visible but the VM bit is not set, we don't
+		 * need to dirty the heap page.  However, if checksums are enabled, we
+		 * do need to make sure that the heap page is dirtied before passing
+		 * it to visibilitymap_set(), because it may be logged.
 		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
+		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
 									   flags);
 
 		/*
+		 * Even if we are only setting the all-frozen bit, there is a small
+		 * chance that the VM was modified sometime between setting
+		 * all_visible_according_to_vm and checking the visibility during
+		 * pruning. Check the return value of old_vmbits to ensure the
+		 * visibility map counters used for logging are accurate.
+		 *
 		 * If the page wasn't already set all-visible and/or all-frozen in the
 		 * VM, count it as newly set for logging.
 		 */
@@ -2203,66 +2226,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v9-0010-Keep-all_frozen-updated-too-in-heap_page_prune_an.patch (3.1K, 11-v9-0010-Keep-all_frozen-updated-too-in-heap_page_prune_an.patch)
  download | inline diff:
From c947f3564585049b4349216cbbc57c42aaea8aaf Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v9 10/22] Keep all_frozen updated too in
 heap_page_prune_and_freeze

We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.

Future commits will separate usage of these fields, so it is best not to
rely on all_visible for all_frozen's validity.
---
 src/backend/access/heap/pruneheap.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5c08a5d44c7..18eab8d0518 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
 	 * whether to freeze the page or not.  The all_visible and all_frozen
 	 * values returned to the caller are adjusted to include LP_DEAD items at
 	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -824,6 +820,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -1468,7 +1465,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1490,7 +1487,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1503,7 +1500,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1522,7 +1519,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1540,7 +1537,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
-- 
2.43.0



  [text/x-patch] v9-0011-Update-VM-in-pruneheap.c.patch (12.7K, 12-v9-0011-Update-VM-in-pruneheap.c.patch)
  download | inline diff:
From fe909609c0d76b835430169a2b7579b0177ca2d1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v9 11/22] Update VM in pruneheap.c

As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
 src/backend/access/heap/pruneheap.c  | 99 +++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 99 +++++-----------------------
 src/include/access/heapam.h          | 15 +++--
 3 files changed, 106 insertions(+), 107 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 18eab8d0518..3483b5caff3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -360,7 +360,8 @@ identify_and_fix_vm_corruption(Relation relation,
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -436,6 +437,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint;
+	uint8		vmflags = 0;
+	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -936,7 +939,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 *
 	 * Now that freezing has been finalized, unset all_visible if there are
 	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
+	 * of the page, as expected for updating the visibility map.
 	 */
 	if (prstate.all_visible && prstate.lpdead_items == 0)
 	{
@@ -952,31 +955,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->hastup = prstate.hastup;
 
 	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so the VM update record doesn't need it.
 	 */
 	if (presult->all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
 		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
 
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
 	/*
-	 * Clear any VM corruption. This does not need to be done in a critical
-	 * section.
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
 	 */
-	presult->vm_corruption = false;
 	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
-		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
-																blockno, buffer, page,
-																blk_known_av,
-																prstate.lpdead_items, vmbuffer);
+	{
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av,
+										   prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/*
+		 * If the page isn't yet marked all-visible in the VM or it is and
+		 * needs to me marked all-frozen, update the VM Note that all_frozen
+		 * is only valid if all_visible is true, so we must check both
+		 * all_visible and all_frozen.
+		 */
+		else if (presult->all_visible &&
+				 (!blk_known_av ||
+				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			Assert(prstate.lpdead_items == 0);
+			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			/*
+			 * If the page is all-frozen, we can pass InvalidTransactionId as
+			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+			 * make everything safe for REDO was logged when the page's tuples
+			 * were frozen.
+			 */
+			if (presult->all_frozen)
+			{
+				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			}
+
+			/*
+			 * It's possible for the VM bit to be clear and the page-level bit
+			 * to be set if checksums are not enabled.
+			 *
+			 * And even if we are just planning to update the frozen bit in
+			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
+			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+			 * might have become stale.
+			 *
+			 * If the heap page is all-visible but the VM bit is not set, we
+			 * don't need to dirty the heap page.  However, if checksums are
+			 * enabled, we do need to make sure that the heap page is dirtied
+			 * before passing it to visibilitymap_set(), because it may be
+			 * logged.
+			 */
+			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+			{
+				PageSetAllVisible(page);
+				MarkBufferDirty(buffer);
+			}
+
+			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+										   vmbuffer, presult->vm_conflict_horizon,
+										   vmflags);
+		}
+	}
+
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
+
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = vmflags;
+
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d49c71bc1b5..05d3d2a3267 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1932,7 +1932,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -1948,7 +1947,8 @@ cmpOffsetNumbers(const void *a, const void *b)
  * vmbuffer is the buffer containing the VM block with visibility information
  * for the heap block, blkno. all_visible_according_to_vm is the saved
  * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1977,6 +1977,7 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
+	 * Then, if the page's visibility status has changed, update the VM.
 	 *
 	 * If the relation has no indexes, we can immediately mark would-be dead
 	 * items LP_UNUSED.
@@ -1985,10 +1986,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * presult.ndeleted.  It should not be confused with presult.lpdead_items;
 	 * presult.lpdead_items's final value can be thought of as the number of
 	 * tuples that were deleted from indexes.
-	 *
-	 * We will update the VM after collecting LP_DEAD items and freezing
-	 * tuples. Pruning will have determined whether or not the page is
-	 * all-visible.
 	 */
 	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
@@ -2080,88 +2077,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (presult.vm_corruption)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		/* Don't update the VM if we just cleared corruption in it */
-	}
-
-	/*
-	 * If the page isn't yet marked all-visible in the VM or it is and needs
-	 * to me marked all-frozen, update the VM Note that all_frozen is only
-	 * valid if all_visible is true, so we must check both all_visible and
-	 * all_frozen.
-	 */
-	else if (presult.all_visible &&
-			 (!all_visible_according_to_vm ||
-			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		/*
-		 * If the page is all-frozen, we can pass InvalidTransactionId as our
-		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
-		 * everything safe for REDO was logged when the page's tuples were
-		 * frozen.
-		 */
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * If the heap page is all-visible but the VM bit is not set, we don't
-		 * need to dirty the heap page.  However, if checksums are enabled, we
-		 * do need to make sure that the heap page is dirtied before passing
-		 * it to visibilitymap_set(), because it may be logged.
-		 */
-		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * Even if we are only setting the all-frozen bit, there is a small
-		 * chance that the VM was modified sometime between setting
-		 * all_visible_according_to_vm and checking the visibility during
-		 * pruning. Check the return value of old_vmbits to ensure the
-		 * visibility map counters used for logging are accurate.
-		 *
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
+	}
 
 	return presult.ndeleted;
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
+	 * all_visible and all_frozen indicate the status of the page as reflected
+	 * in the visibility map after pruning, freezing, and setting any pages
+	 * all-visible in the visibility map.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * vm_conflict_horizon is the newest xmin of live tuples on the page
+	 * (older than OldestXmin).  It will only be valid if we did not set the
+	 * page all-frozen in the VM.
 	 *
 	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
-	bool		vm_corruption;
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
-- 
2.43.0



  [text/x-patch] v9-0013-Rename-PruneState.freeze-to-attempt_freeze.patch (4.1K, 13-v9-0013-Rename-PruneState.freeze-to-attempt_freeze.patch)
  download | inline diff:
From 96013b0fbfd3bf63d2940549e51317a89ee73b4e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v9 13/22] Rename PruneState.freeze to attempt_freeze

This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.

Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
 src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 683c1762c25..669c088ccff 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
 	/* whether or not dead items can be set LP_UNUSED during pruning */
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
-	bool		freeze;
+	bool		attempt_freeze;
 
 	/*
 	 * Whether or not to consider updating the VM. There is some bookkeeping
@@ -452,7 +452,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
-	bool		hint_bit_fpi;
+	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	bool		all_frozen_except_lp_dead = false;
 	bool		set_pd_all_visible = false;
@@ -460,7 +460,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
 	prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
@@ -485,7 +485,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/* initialize page freezing working state */
 	prstate.pagefrz.freeze_required = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
 		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -535,7 +535,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * bookkeeping. Initializing all_visible to false allows skipping the work
 	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -653,7 +653,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
 	 * an FPI to be emitted.
 	 */
-	hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	/*
 	 * Process HOT chains.
@@ -770,7 +770,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * plans we prepared, or not.
 	 */
 	do_freeze = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (prstate.pagefrz.freeze_required)
 		{
@@ -803,7 +803,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				 */
 				if (RelationNeedsWAL(relation))
 				{
-					if (hint_bit_fpi)
+					if (did_tuple_hint_fpi)
 						do_freeze = true;
 					else if (do_prune)
 					{
@@ -1128,7 +1128,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
 		{
@@ -1715,7 +1715,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
 	 * tuple to know whether or not the page will be totally frozen.
 	 */
-	if (prstate->freeze)
+	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
 
-- 
2.43.0



  [text/x-patch] v9-0014-Remove-xl_heap_visible-entirely.patch (24.1K, 14-v9-0014-Remove-xl_heap_visible-entirely.patch)
  download | inline diff:
From faf936042bbe225175e8bc6474d3617e70cb215d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v9 14/22] Remove xl_heap_visible entirely

There are now no users of this, so eliminate it entirely.
---
 src/backend/access/common/bufmask.c      |   3 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 152 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  10 +-
 src/backend/access/heap/visibilitymap.c  | 109 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  11 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 32 insertions(+), 364 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 893a739009a..cb16bb0cbbd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
 #include "access/valid.h"
 #include "access/visibilitymap.h"
 #include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
@@ -2523,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			visibilitymap_set_vmbits(relation,
-									 BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set(relation,
+							  BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN);
 		}
 
 		/*
@@ -8798,49 +8799,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
 
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11c11929ed9..ff3ad8b4cd2 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -53,6 +53,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
 	vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
 
 	/*
 	 * After xl_heap_prune is the optional snapshot conflict horizon.
@@ -250,7 +252,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+		old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
 
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
@@ -269,142 +271,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -785,15 +651,14 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(reln, blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN);
-
 		/*
 		 * It is not possible that the VM was already set for this heap page,
 		 * so the vmbuffer must have been modified and marked dirty.
 		 */
+		visibilitymap_set(reln, blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		FreeFakeRelcacheEntry(reln);
@@ -1374,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 669c088ccff..ecc100c3362 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -980,8 +980,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		{
 			Assert(PageIsAllVisible(page));
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			old_vmbits = visibilitymap_set_vmbits(relation, blockno,
-												  vmbuffer, vmflags);
+			old_vmbits = visibilitymap_set(relation, blockno,
+										   vmbuffer, vmflags);
 
 			if (old_vmbits == vmflags)
 			{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 75c10ba20c6..2ff67d77cb4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1886,8 +1886,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 			MarkBufferDirty(buf);
 
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			visibilitymap_set_vmbits(vacrel->rel, blkno,
-									 vmbuffer, new_vmbits);
+			visibilitymap_set(vacrel->rel, blkno,
+							  vmbuffer, new_vmbits);
 
 			if (RelationNeedsWAL(vacrel->rel))
 			{
@@ -2753,9 +2753,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		set_pd_all_vis = true;
 		PageSetAllVisible(page);
 		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-		visibilitymap_set_vmbits(vacrel->rel,
-								 blkno,
-								 vmbuffer, vmflags);
+		visibilitymap_set(vacrel->rel,
+						  blkno,
+						  vmbuffer, vmflags);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index bb8dfd8910a..f7bad68ffc5 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *      visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set flags in the VM block contained in the passed in vmBuf.
  *
@@ -343,8 +240,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * is pinned and exclusive locked.
  */
 uint8
-visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index c95d30dfe8d..47998f1df15 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -343,13 +343,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -454,9 +447,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d8508593e7c..3672f372aa8 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -446,20 +445,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -503,11 +488,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index fc7056a91ea..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..b4c880c083f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4273,7 +4273,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v9-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patch (7.1K, 15-v9-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patch)
  download | inline diff:
From 00012be836b472c2f0185b1c037cf29b480e5507 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v9 15/22] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()

Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 14 +++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 13 ++++++-------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ecc100c3362..73ca4e88c1f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -574,9 +574,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1173,11 +1173,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..f67f01c17c2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v9-0016-Use-GlobalVisState-to-determine-page-level-visibi.patch (10.8K, 16-v9-0016-Use-GlobalVisState-to-determine-page-level-visibi.patch)
  download | inline diff:
From 438ce859c03936b016e1345be5e9d5950d96f514 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v9 16/22] Use GlobalVisState to determine page level
 visibility

During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.

It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.

Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.

Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
 src/backend/access/heap/pruneheap.c         | 48 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 19 ++++----
 src/include/access/heapam.h                 |  4 +-
 4 files changed, 60 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 73ca4e88c1f..273e9412a01 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon, when setting the VM or when
+	 * freezing all the live tuples on the page.
 	 *
 	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
 	 * directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -553,14 +552,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon.
-	 * This is most likely to happen when updating the VM and/or freezing all
-	 * live tuples on the page. It is updated before returning to the caller
-	 * because vacuum does assert-build only validation on the page using this
-	 * field.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState.
+	 *
+	 * If we encounter an uncommitted tuple, this field is unmaintained. If
+	 * the page is being set all-visible or when freezing all live tuples on
+	 * the page, it is used to calculate the snapshot conflict horizon.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -756,6 +753,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -1099,12 +1106,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(cutoffs);
-
 		Assert(prstate.lpdead_items == 0);
 
 		if (!heap_page_is_all_visible(relation, buffer,
-									  cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1629,19 +1634,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2ff67d77cb4..7558ac697f1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2715,7 +2715,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 InvalidOffsetNumber);
 
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3458,13 +3458,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3483,7 +3483,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * *all_frozen is an output parameter indicating to the caller if every tuple
  * on the page is frozen.
@@ -3504,7 +3504,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3576,8 +3576,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		tuple.t_len = ItemIdGetLength(itemid);
 		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
-										 buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+												  buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3596,8 +3596,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin,
-											   OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
 
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v9-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch (29.0K, 17-v9-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch)
  download | inline diff:
From 73cfbe246ba075db052afd207749a7c66ec1a9bc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v9 12/22] Eliminate xl_heap_visible from vacuum phase I
 prune/freeze

Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.

This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
 src/backend/access/heap/pruneheap.c  | 456 ++++++++++++++++-----------
 src/backend/access/heap/vacuumlazy.c |  30 --
 src/include/access/heapam.h          |  15 +-
 3 files changed, 279 insertions(+), 222 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3483b5caff3..683c1762c25 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		freeze;
+
+	/*
+	 * Whether or not to consider updating the VM. There is some bookkeeping
+	 * that must be maintained if we would like to update the VM.
+	 */
+	bool		consider_update_vm;
+
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
 	 *
 	 * These fields are not used by pruning itself for the most part, but are
 	 * used to collect information about what was pruned and what state the
-	 * page is in after pruning, for the benefit of the caller.  They are
-	 * copied to the caller's PruneFreezeResult at the end.
+	 * page is in after pruning to use when updating the visibility map and
+	 * for the benefit of the caller.  They are copied to the caller's
+	 * PruneFreezeResult at the end.
 	 * -------------------------------------------------------
 	 */
 
@@ -138,11 +146,10 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+	 * directly before updating the VM. We ignore LP_DEAD items when deciding
+	 * whether or not to opportunistically freeze and when determining the
+	 * snapshot conflict horizon required when freezing tuples.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -371,12 +378,15 @@ identify_and_fix_vm_corruption(Relation relation,
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
  *
  * blk_known_av is the visibility status of the heap block as of the last call
  * to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -392,6 +402,8 @@ identify_and_fix_vm_corruption(Relation relation,
  *   FREEZE indicates that we will also freeze tuples, and will return
  *   'all_visible', 'all_frozen' flags to the caller.
  *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -436,18 +448,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	HeapTupleData tup;
 	bool		do_freeze;
 	bool		do_prune;
-	bool		do_hint;
+	bool		do_hint_full_or_prunable;
+	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	bool		all_frozen_except_lp_dead = false;
+	bool		set_pd_all_visible = false;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
+	Assert(!prstate.consider_update_vm || vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -492,50 +510,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Keep track of whether or not the page will be all-visible and
+	 * all-frozen for use in opportunistic freezing and to update the VM if
+	 * the caller requests it.
+	 *
+	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
+	 * other callers could do either one. The visibility bookkeeping is
+	 * required for opportunistic freezing (in addition to setting the VM
+	 * bits) because we only consider opportunistically freezing tuples if the
+	 * whole page would become all-frozen or if the whole page will be frozen
+	 * except for dead tuples that will be removed by vacuum.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * If only updating the VM, we must initialize all_frozen to false, as
+	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
+	 * page and we will not end up correctly setting it to false later.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * Dead tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing, so we do not clear
+	 * all_visible when we see LP_DEAD items. We fix that after determining
+	 * whether or not to freeze but before deciding whether or not to update
+	 * the VM so that we don't set the VM bit incorrectly.
+	 *
+	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * bookkeeping. Initializing all_visible to false allows skipping the work
+	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
 	if (prstate.freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if (prstate.consider_update_vm)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon.
+	 * This is most likely to happen when updating the VM and/or freezing all
+	 * live tuples on the page. It is updated before returning to the caller
+	 * because vacuum does assert-build only validation on the page using this
+	 * field.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -733,10 +758,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
+	 * pd_prune_xid field or the page was marked full, we will update those
+	 * hint bits.
 	 */
-	do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_full_or_prunable =
+		((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
 	/*
@@ -784,7 +810,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 						if (XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
 					}
-					else if (do_hint)
+					else if (do_hint_full_or_prunable)
 					{
 						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
@@ -823,11 +849,84 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	all_frozen_except_lp_dead = prstate.all_frozen;
+	if (prstate.lpdead_items > 0)
+	{
+		prstate.all_visible = false;
+		prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
+	 */
+	if (prstate.consider_update_vm)
+	{
+		/*
+		 * Clear any VM corruption. This does not need to be in a critical
+		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+		 * we may mark the heap page buffer dirty here and could end up doing
+		 * so again later. This is not a correctness issue and is in the path
+		 * of VM corruption, so we don't have to worry about the extra
+		 * performance overhead.
+		 */
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av, prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/* Determine if we actually need to set the VM and which bits to set. */
+		else if (prstate.all_visible &&
+				 (!blk_known_av ||
+				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+			if (prstate.all_frozen)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+	}
+
+	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+	/* Save these for the caller in case we later zero out vmflags */
+	presult->new_vmbits = vmflags;
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
-	if (do_hint)
+	if (do_hint_full_or_prunable)
 	{
 		/*
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
@@ -843,15 +942,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageClearFull(page);
 
 		/*
-		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+		 * this is a non-WAL-logged hint.  If we are going to freeze or prune
+		 * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+		 * dirty and emit WAL below.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_prune && !do_freeze && !set_pd_all_visible)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -865,12 +965,48 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (set_pd_all_visible)
+			PageSetAllVisible(page);
+
+		/*
+		 * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+		 * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+		 * appropriate here.
+		 */
+		if (do_prune || do_freeze || set_pd_all_visible)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			Assert(PageIsAllVisible(page));
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			old_vmbits = visibilitymap_set_vmbits(relation, blockno,
+												  vmbuffer, vmflags);
+
+			if (old_vmbits == vmflags)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				do_set_vm = false;
+				/* 0 out vmflags so we don't emit WAL to update the VM */
+				vmflags = 0;
+			}
+		}
+
+		/*
+		 * It should never be the case that PD_ALL_VISIBLE is not set and the
+		 * VM is set. Or, if it were, we should have caught it earlier when
+		 * finding and fixing VM corruption. So, if we found out the VM was
+		 * already set above, we should have found PD_ALL_VISIBLE set earlier.
+		 */
+		Assert(!set_pd_all_visible || do_set_vm);
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+		 * we were only updating the VM and it turns out it was already set,
+		 * we will have unset do_set_vm earlier. As such, check it again
+		 * before emitting the record.
 		 */
-		if (RelationNeedsWAL(relation))
+		if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -882,35 +1018,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
-			TransactionId conflict_xid;
+			TransactionId conflict_xid = InvalidTransactionId;
+
+			/*
+			 * If we are updating the VM, the conflict horizon is almost
+			 * always the visibility cutoff XID.
+			 *
+			 * Separately, if we are freezing any tuples, as an optimization,
+			 * we can use the visibility_cutoff_xid as the conflict horizon if
+			 * the page will be all-frozen. This is true even if there are
+			 * LP_DEAD line pointers because we ignored those when maintaining
+			 * the visibility_cutoff_xid.
+			 */
+			if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+				conflict_xid = prstate.visibility_cutoff_xid;
 
 			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
+			 * Otherwise, if we are freezing but the page would not be
+			 * all-frozen, we have to use the more pessimistic horizon of
+			 * OldestXmin, which may be newer than the newest tuple we froze.
+			 * We currently don't track the newest tuple we froze.
 			 */
-			if (do_freeze)
+			else if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
+				conflict_xid = prstate.cutoffs->OldestXmin;
+				TransactionIdRetreat(conflict_xid);
 			}
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
+			/*
+			 * If we are removing tuples with a younger xmax than our so far
+			 * calculated conflict_xid, we must use this as our horizon.
+			 */
+			if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
 				conflict_xid = prstate.latest_xid_removed;
 
+			/*
+			 * We can omit the snapshot conflict horizon if we are not pruning
+			 * or freezing any tuples and are setting an already all-visible
+			 * page all-frozen in the VM. In this case, all of the tuples on
+			 * the page must already be visible to all MVCC snapshots on the
+			 * standby.
+			 */
+			if (!do_prune && !do_freeze && do_set_vm &&
+				blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+				conflict_xid = InvalidTransactionId;
+
 			log_heap_prune_and_freeze(relation, buffer,
 									  false,
-									  InvalidBuffer, 0, false,
+									  vmbuffer,
+									  vmflags,
+									  set_pd_all_visible,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -922,124 +1079,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected for updating the visibility map.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
-	presult->hastup = prstate.hastup;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so the VM update record doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * VACUUM will call heap_page_is_all_visible() during the second pass over
+	 * the heap to determine all_visible and all_frozen for the page -- this
+	 * is a specialized version of the logic from this function.  Now that
+	 * we've finished pruning and freezing, make sure that we're in total
+	 * agreement with heap_page_is_all_visible() using an assertion. We will
+	 * have already set the page in the VM, so this assertion will only let
+	 * you know that you've already done something wrong.
 	 */
-	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
 	{
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av,
-										   prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		/*
-		 * If the page isn't yet marked all-visible in the VM or it is and
-		 * needs to me marked all-frozen, update the VM Note that all_frozen
-		 * is only valid if all_visible is true, so we must check both
-		 * all_visible and all_frozen.
-		 */
-		else if (presult->all_visible &&
-				 (!blk_known_av ||
-				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			Assert(prstate.lpdead_items == 0);
-			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		Assert(cutoffs);
 
-			/*
-			 * If the page is all-frozen, we can pass InvalidTransactionId as
-			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
-			 * make everything safe for REDO was logged when the page's tuples
-			 * were frozen.
-			 */
-			if (presult->all_frozen)
-			{
-				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			}
+		Assert(prstate.lpdead_items == 0);
 
-			/*
-			 * It's possible for the VM bit to be clear and the page-level bit
-			 * to be set if checksums are not enabled.
-			 *
-			 * And even if we are just planning to update the frozen bit in
-			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
-			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
-			 * might have become stale.
-			 *
-			 * If the heap page is all-visible but the VM bit is not set, we
-			 * don't need to dirty the heap page.  However, if checksums are
-			 * enabled, we do need to make sure that the heap page is dirtied
-			 * before passing it to visibilitymap_set(), because it may be
-			 * logged.
-			 */
-			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buffer);
-			}
+		if (!heap_page_is_all_visible(relation, buffer,
+									  cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
 
-			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
-										   vmbuffer, presult->vm_conflict_horizon,
-										   vmflags);
-		}
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
+#endif
 
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->old_vmbits = old_vmbits;
+	/* new_vmbits was set above */
+	presult->hastup = prstate.hastup;
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = vmflags;
-
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
@@ -1621,7 +1709,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			break;
 	}
 
-	/* Consider freezing any normal tuples which will not be removed */
+	/*
+	 * Consider freezing any normal tuples which will not be removed.
+	 * Regardless of whether or not we want to freeze the tuples, if we want
+	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+	 * tuple to know whether or not the page will be totally frozen.
+	 */
 	if (prstate->freeze)
 	{
 		bool		totally_frozen;
@@ -2184,7 +2277,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phase III, the heap page may be marked
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
  *   all-visible and all-frozen.
  *
  * These changes all happen together, so we use a singel WAL record for them
@@ -2238,6 +2331,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
 	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
+	Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
 	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
 	xlrec.flags = vmflags;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 05d3d2a3267..75c10ba20c6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2013,34 +2013,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2074,8 +2046,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate the status of the page as reflected
-	 * in the visibility map after pruning, freezing, and setting any pages
-	 * all-visible in the visibility map.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page
-	 * (older than OldestXmin).  It will only be valid if we did not set the
-	 * page all-frozen in the VM.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
 	uint8		old_vmbits;
 	uint8		new_vmbits;
 
-- 
2.43.0



  [text/x-patch] v9-0017-Inline-TransactionIdFollows-Precedes.patch (4.9K, 18-v9-0017-Inline-TransactionIdFollows-Precedes.patch)
  download | inline diff:
From 5be056f48478db42dc0ad09d480e091cd8c53ebe Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v9 17/22] Inline TransactionIdFollows/Precedes()

Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
---
 src/backend/access/transam/transam.c | 64 -------------------------
 src/include/access/transam.h         | 70 ++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
-	/*
-	 * If either ID is a permanent XID then we can just do unsigned
-	 * comparison.  If both are normal, do a modulo-2^32 comparison.
-	 */
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 < id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 <= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 > id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 >= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff >= 0);
-}
-
 
 /*
  * TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
 } TransamVariablesData;
 
 
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+	/*
+	 * If either ID is a permanent XID then we can just do unsigned
+	 * comparison.  If both are normal, do a modulo-2^32 comparison.
+	 */
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 < id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 <= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 > id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 >= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff >= 0);
+}
+
+
 /* ----------------
  *		extern declarations
  * ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
 extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
-- 
2.43.0



  [text/x-patch] v9-0019-Allow-on-access-pruning-to-set-pages-all-visible.patch (27.3K, 19-v9-0019-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From a83906def96db35ce75f93b3488ad64fc81b067f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v9 19/22] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c              | 15 ++++-
 src/backend/access/heap/heapam_handler.c      | 15 ++++-
 src/backend/access/heap/pruneheap.c           | 67 ++++++++++++++-----
 src/backend/access/index/indexam.c            | 46 +++++++++++++
 src/backend/access/table/tableam.c            | 39 +++++++++--
 src/backend/executor/execMain.c               |  4 ++
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  7 +-
 src/backend/executor/nodeIndexscan.c          | 18 +++--
 src/backend/executor/nodeSeqscan.c            | 24 +++++--
 src/include/access/genam.h                    | 11 +++
 src/include/access/heapam.h                   | 24 ++++++-
 src/include/access/relscan.h                  |  6 ++
 src/include/access/tableam.h                  | 30 ++++++++-
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  4 +-
 16 files changed, 278 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cb16bb0cbbd..d07693b7075 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1247,6 +1250,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1285,6 +1289,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1317,6 +1327,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ba3faab91fd..4400bf583dd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			int			options = 0;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				options = HEAP_PAGE_PRUNE_UPDATE_VM;
+			}
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * that during on-access pruning with the current implementation.
 			 */
 			heap_page_prune_and_freeze(relation, buffer, false,
-									   InvalidBuffer,
-									   vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+									   vmbuffer ? *vmbuffer : InvalidBuffer,
+									   vistest, options,
+									   NULL, &presult, PRUNE_ON_ACCESS,
+									   &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -513,12 +525,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * all-frozen for use in opportunistic freezing and to update the VM if
 	 * the caller requests it.
 	 *
-	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
-	 * other callers could do either one. The visibility bookkeeping is
-	 * required for opportunistic freezing (in addition to setting the VM
-	 * bits) because we only consider opportunistically freezing tuples if the
-	 * whole page would become all-frozen or if the whole page will be frozen
-	 * except for dead tuples that will be removed by vacuum.
+	 * Currently, only VACUUM attempts freezing. But other callers could. The
+	 * visibility bookkeeping is required for opportunistic freezing (in
+	 * addition to setting the VM bits) because we only consider
+	 * opportunistically freezing tuples if the whole page would become
+	 * all-frozen or if the whole page will be frozen except for dead tuples
+	 * that will be removed by vacuum. But if consider_update_vm is false,
+	 * we'll not set the VM even if the page is discovered to be all-visible.
+	 *
+	 * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+	 * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+	 * because we will not call heap_prepare_freeze_tuple() on each tuple.
 	 *
 	 * If only updating the VM, we must initialize all_frozen to false, as
 	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -530,7 +547,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * whether or not to freeze but before deciding whether or not to update
 	 * the VM so that we don't set the VM bit incorrectly.
 	 *
-	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * If not freezing and not updating the VM, we avoid the extra
 	 * bookkeeping. Initializing all_visible to false allows skipping the work
 	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
@@ -879,12 +896,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.all_frozen = false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate.consider_update_vm &&
+		prstate.all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate.consider_update_vm = false;
+		prstate.all_visible = prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * Handle setting visibility map bit based on information from the VM (if
+	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+	 * call), and from all_visible and all_frozen variables.
 	 */
 	if (prstate.consider_update_vm)
 	{
@@ -2275,8 +2310,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a singel WAL record for them
  * all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff12e2e1364..2e0474c948a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index b2ce35e2a34..e31c21cf8eb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -881,6 +883,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -918,10 +939,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, struct ScanKeyData *key)
+				   int nkeys, struct ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index de782014b2d..839c1be1d7c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -678,6 +678,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..870f03bdd79 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -10,6 +10,7 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Time::HiRes qw(usleep);
 use Test::More;
+use Time::HiRes qw(usleep);
 
 if ($ENV{enable_injection_points} ne 'yes')
 {
@@ -296,6 +297,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -745,7 +747,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v9-0018-Unset-all-visible-sooner-if-not-freezing.patch (2.5K, 20-v9-0018-Unset-all-visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 2dc6f7ada64352284c96e5f0d069913a6f1f6eef Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v9 18/22] Unset all-visible sooner if not freezing

In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.

Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.

Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
 src/backend/access/heap/pruneheap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 273e9412a01..ba3faab91fd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1494,8 +1494,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1753,8 +1756,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v9-0021-Reorder-heap_page_prune_and_freeze-parameters.patch (5.8K, 21-v9-0021-Reorder-heap_page_prune_and_freeze-parameters.patch)
  download | inline diff:
From fd56683e500e528ee9da99a7326368aca8cb8bac Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v9 21/22] Reorder heap_page_prune_and_freeze parameters

Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
 src/backend/access/heap/pruneheap.c  | 38 ++++++++++++++--------------
 src/backend/access/heap/vacuumlazy.c |  6 ++---
 src/include/access/heapam.h          |  4 +--
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b9f85d1452e..53cb81d2510 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, false,
+			heap_page_prune_and_freeze(relation, buffer, options, false,
 									   vmbuffer ? *vmbuffer : InvalidBuffer,
-									   vistest, options,
-									   NULL, &presult, PRUNE_ON_ACCESS,
+									   vistest,
+									   NULL, PRUNE_ON_ACCESS, &presult,
 									   &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -645,6 +645,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
+ * options:
+ *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ *   pruning.
+ *
+ *   FREEZE indicates that we will also freeze tuples, and will return
+ *   'all_visible', 'all_frozen' flags to the caller.
+ *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
@@ -663,30 +672,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * contain the required block of the visibility map.
  *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- *   UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
  *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed.  It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -699,13 +699,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   int options,
 						   bool blk_known_av,
 						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
-						   int options,
 						   struct VacuumCutoffs *cutoffs,
-						   PruneFreezeResult *presult,
 						   PruneReason reason,
+						   PruneFreezeResult *presult,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7558ac697f1..99b9cab0974 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1991,11 +1991,11 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf,
+	heap_page_prune_and_freeze(rel, buf, prune_options,
 							   all_visible_according_to_vm,
 							   vmbuffer,
-							   vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+							   vacrel->vistest,
+							   &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
 								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   int options,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
-									   int options,
 									   struct VacuumCutoffs *cutoffs,
-									   PruneFreezeResult *presult,
 									   PruneReason reason,
+									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
-- 
2.43.0



  [text/x-patch] v9-0020-Add-helper-functions-to-heap_page_prune_and_freez.patch (18.9K, 22-v9-0020-Add-helper-functions-to-heap_page_prune_and_freez.patch)
  download | inline diff:
From b4a28cf0ab6cd86be2abc4ff20ecf7e99ed13cf4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v9 20/22] Add helper functions to heap_page_prune_and_freeze

heap_page_prune_and_freeze() has gotten rather long. It has several
stages:

1) setup - where the PruneStateis set up
2) tuple examination -- where tuples and line pointers are examined to
   determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
   heuristics, and state gathered during stage 2 whether or not to
   freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged

This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.

For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
 src/backend/access/heap/pruneheap.c | 471 +++++++++++++++++-----------
 1 file changed, 295 insertions(+), 176 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4400bf583dd..b9f85d1452e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+								  bool do_prune,
+								  bool do_hint_full_or_prunable,
+								  bool did_tuple_hint_fpi,
+								  PruneState *prstate,
+								  bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+									 Buffer buffer, BlockNumber blockno, Page page,
+									 PruneReason reason,
+									 bool do_prune, bool do_freeze,
+									 bool blk_known_av,
+									 PruneState *prstate,
+									 Buffer *vmbuffer, uint8 *vmflags,
+									 bool *set_pd_all_visible);
+
 static bool identify_and_fix_vm_corruption(Relation relation,
 										   BlockNumber heap_blk,
 										   Buffer heap_buffer, Page heap_page,
@@ -376,6 +392,249 @@ identify_and_fix_vm_corruption(Relation relation,
 	return false;
 }
 
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+						 Buffer buffer, BlockNumber blockno, Page page,
+						 PruneReason reason,
+						 bool do_prune, bool do_freeze,
+						 bool blk_known_av,
+						 PruneState *prstate,
+						 Buffer *vmbuffer, uint8 *vmflags,
+						 bool *set_pd_all_visible)
+{
+	bool		do_set_vm = false;
+
+	/*
+	 * If the caller specified not to update the VM, validate everything is in
+	 * the right state and exit.
+	 */
+	if (!prstate->consider_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		/* We don't set only the page level visibility hint */
+		Assert(!(*set_pd_all_visible));
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->consider_update_vm &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate->consider_update_vm = false;
+		prstate->all_visible = prstate->all_frozen = false;
+	}
+
+	Assert(!prstate->all_frozen || prstate->all_visible);
+
+	/*
+	 * Clear any VM corruption. This does not need to be in a critical
+	 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+	 * may mark the heap page buffer dirty here and could end up doing so
+	 * again later. This is not a correctness issue and is in the path of VM
+	 * corruption, so we don't have to worry about the extra performance
+	 * overhead.
+	 */
+	if (identify_and_fix_vm_corruption(relation,
+									   blockno, buffer, page,
+									   blk_known_av, prstate->lpdead_items,
+									   *vmbuffer))
+	{
+		/* If we fix corruption, don't update the VM further */
+	}
+
+	/* Determine if we actually need to set the VM and which bits to set. */
+	else if (prstate->all_visible &&
+			 (!blk_known_av ||
+			  (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+	{
+		*vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+	}
+
+	do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	*set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+	return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool do_prune,
+					  bool do_hint_full_or_prunable,
+					  bool did_tuple_hint_fpi,
+					  PruneState *prstate,
+					  bool *all_frozen_except_lp_dead)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and exit.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		Assert(!(*all_frozen_except_lp_dead));
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_full_or_prunable)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	*all_frozen_except_lp_dead = prstate->all_frozen;
+	if (prstate->lpdead_items > 0)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
+
+	return do_freeze;
+}
+
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page. If the page's visibility status has changed, update it in
@@ -766,20 +1025,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Clear the offset information once we have processed the given page. */
 	*off_loc = InvalidOffsetNumber;
 
-	do_prune = prstate.nredirected > 0 ||
-		prstate.ndead > 0 ||
-		prstate.nunused > 0;
-
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
-	 * all-visible.
+	 * all-visible. This must be done before we decide whether or not to
+	 * opportunistically freeze below because we do not want to
+	 * opportunistically freeze the page if there are live tuples not visible
+	 * to everyone, which would prevent setting the page frozen in the VM.
 	 */
 	if (prstate.all_visible &&
 		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
+	/*
+	 * Now decide based on information collected while examining every tuple
+	 * which actions to take. If there are any prunable tuples, we'll prune
+	 * them. However, we will decide based on options specified by the caller
+	 * and various heuristics whether or not to freeze any tuples and whether
+	 * or not the page should be set all-visible/all-frozen in the VM.
+	 */
+	do_prune = prstate.nredirected > 0 ||
+		prstate.ndead > 0 ||
+		prstate.nunused > 0;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -790,182 +1059,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageIsFull(page);
 
 	/*
-	 * Decide if we want to go ahead with freezing according to the freeze
-	 * plans we prepared, or not.
-	 */
-	do_freeze = false;
-	if (prstate.attempt_freeze)
-	{
-		if (prstate.pagefrz.freeze_required)
-		{
-			/*
-			 * heap_prepare_freeze_tuple indicated that at least one XID/MXID
-			 * from before FreezeLimit/MultiXactCutoff is present.  Must
-			 * freeze to advance relfrozenxid/relminmxid.
-			 */
-			do_freeze = true;
-		}
-		else
-		{
-			/*
-			 * Opportunistically freeze the page if we are generating an FPI
-			 * anyway and if doing so means that we can set the page
-			 * all-frozen afterwards (might not happen until VACUUM's final
-			 * heap pass).
-			 *
-			 * XXX: Previously, we knew if pruning emitted an FPI by checking
-			 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze
-			 * and prune records were combined, this heuristic couldn't be
-			 * used anymore.  The opportunistic freeze heuristic must be
-			 * improved; however, for now, try to approximate the old logic.
-			 */
-			if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
-			{
-				/*
-				 * Freezing would make the page all-frozen.  Have already
-				 * emitted an FPI or will do so anyway?
-				 */
-				if (RelationNeedsWAL(relation))
-				{
-					if (did_tuple_hint_fpi)
-						do_freeze = true;
-					else if (do_prune)
-					{
-						if (XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-					else if (do_hint_full_or_prunable)
-					{
-						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
-	}
-	else if (prstate.nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate.pagefrz.freeze_required);
-
-		prstate.all_frozen = false;
-		prstate.nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page. It needs to reflect the present state of
-	 * the page when using it to determine whether or not to update the VM.
-	 *
-	 * Keep track of whether or not the page was all-frozen except LP_DEAD
-	 * items for the purposes of calculating the snapshot conflict horizon,
-	 * though.
+	 * We must decide whether or not to freeze before deciding if and what to
+	 * set in the VM.
 	 */
-	all_frozen_except_lp_dead = prstate.all_frozen;
-	if (prstate.lpdead_items > 0)
-	{
-		prstate.all_visible = false;
-		prstate.all_frozen = false;
-	}
-
-	/*
-	 * If this is an on-access call and we're not actually pruning, avoid
-	 * setting the visibility map if it would newly dirty the heap page or, if
-	 * the page is already dirty, if doing so would require including a
-	 * full-page image (FPI) of the heap page in the WAL. This situation
-	 * should be rare, as on-access pruning is only attempted when
-	 * pd_prune_xid is valid.
-	 */
-	if (reason == PRUNE_ON_ACCESS &&
-		prstate.consider_update_vm &&
-		prstate.all_visible &&
-		!do_prune && !do_freeze &&
-		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
-	{
-		prstate.consider_update_vm = false;
-		prstate.all_visible = prstate.all_frozen = false;
-	}
-
-	Assert(!prstate.all_frozen || prstate.all_visible);
-
-	/*
-	 * Handle setting visibility map bit based on information from the VM (if
-	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
-	 * call), and from all_visible and all_frozen variables.
-	 */
-	if (prstate.consider_update_vm)
-	{
-		/*
-		 * Clear any VM corruption. This does not need to be in a critical
-		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
-		 * we may mark the heap page buffer dirty here and could end up doing
-		 * so again later. This is not a correctness issue and is in the path
-		 * of VM corruption, so we don't have to worry about the extra
-		 * performance overhead.
-		 */
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av, prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
-
-		/* Determine if we actually need to set the VM and which bits to set. */
-		else if (prstate.all_visible &&
-				 (!blk_known_av ||
-				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
-			if (prstate.all_frozen)
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-	}
-
-	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
-
-	/*
-	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
-	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
-	 * set, we strongly prefer to keep them in sync.
-	 *
-	 * Prior to Postgres 19, it was possible for the page-level bit to be set
-	 * and the VM bit to be clear. This could happen if we crashed after
-	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
-	 */
-	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+	do_freeze = heap_page_will_freeze(relation, buffer,
+									  do_prune,
+									  do_hint_full_or_prunable,
+									  did_tuple_hint_fpi,
+									  &prstate,
+									  &all_frozen_except_lp_dead);
+
+	do_set_vm = heap_page_will_update_vm(relation,
+										 buffer, blockno, page,
+										 reason,
+										 do_prune, do_freeze,
+										 blk_known_av,
+										 &prstate,
+										 &vmbuffer,
+										 &vmflags, &set_pd_all_visible);
 
 	/* Save these for the caller in case we later zero out vmflags */
 	presult->new_vmbits = vmflags;
 
-	/* Any error while applying the changes is critical */
+	/*
+	 * Time to actually make the changes to the page and log them. Any error
+	 * while applying the changes is critical.
+	 */
 	START_CRIT_SECTION();
 
 	if (do_hint_full_or_prunable)
-- 
2.43.0



  [text/x-patch] v9-0022-Set-pd_prune_xid-on-insert.patch (6.5K, 23-v9-0022-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 9b5273ec435a8025295c3cfbded611795b50f4d8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v9 22/22] Set pd_prune_xid on insert

Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.

For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.

Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.

ci-os-only:
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d07693b7075..02aa2383c50 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2164,15 +2165,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2182,7 +2187,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index ff3ad8b4cd2..e7d7804871b 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -470,6 +470,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -619,9 +625,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-05 22:27  Melanie Plageman <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-09-05 22:27 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, Sep 3, 2025 at 5:06 AM Kirill Reshke <[email protected]> wrote:
>
> small comments regarding new series
>
> 0001, 0002, 0017 LGTM

Thanks for continuing to review!

> In 0015:
>
> Also, maybe GlobalVisXidTestAllVisible is a slightly better name? (The
> term 'all-visible' is one that we occasionally utilize)

Actually, I was trying to distinguish it from all-visible because I
interpret that to mean every thing is visible -- as in, every tuple on
a page is visible to everyone. And here we are referring to one xid
and want to know if it is visible to everyone as no longer running. I
don't think my name  ("visible-to-all") is good, but I'm hesitant to
co-opt "all-visible" here.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-08 15:44  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 2 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-09-08 15:44 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Robert Haas <[email protected]>; Heikki Linnakangas <[email protected]>

On Fri, Sep 5, 2025 at 6:20 PM Melanie Plageman
<[email protected]> wrote:
>
> > On 2025-09-02 19:11:01 -0400, Melanie Plageman wrote:
> > > From dd98177294011ee93cac122405516abd89f4e393 Mon Sep 17 00:00:00 2001
> > > From: Melanie Plageman <[email protected]>
> > > Date: Wed, 27 Aug 2025 08:50:15 -0400
> > > Subject: [PATCH v8 01/22] Remove unneeded VM pin from VM replay
>
> I didn't push it yet because I did a new version that actually
> eliminates the asserts in heap_multi_insert() before calling
> visibilitymap_set() -- since they are redundant with checks inside
> visibilitymap_set(). 0001 of attached v9 is what I plan to push,
> barring any objections.

I pushed this, so rebased v10 is  attached. I've added one new patch:
0002 adds ERRCODE_DATA_CORRUPTED to the existing log messages about
VM/data corruption in vacuum. Andrey Borodin earlier suggested this,
and I had neglected to include it.

- Melanie


Attachments:

  [text/x-patch] v10-0002-Add-error-codes-to-vacuum-VM-corruption-case-log.patch (2.3K, 2-v10-0002-Add-error-codes-to-vacuum-VM-corruption-case-log.patch)
  download | inline diff:
From bf1d4ed090ca4f30d382cb9ff028565967bed5db Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 8 Sep 2025 10:00:34 -0400
Subject: [PATCH v10 02/22] Add error codes to vacuum VM corruption case
 logging

Enhance the log message emitted when the heap page is found not
to be consistent with the VM during vacuum.

PD_ALL_VISIBLE must never be clear if the VM bits are set for this page.
And a page marked all-visible in the VM must not contain dead items.
Both of these cases are either data corruption or VM corruption.

Add ERRCODE_DATA_CORRUPTED to the existing log mesage. Using the
appropriate error codes makes monitoring much easier.

Suggested-by: Andrey Borodin <[email protected]>
Author: Melanie Plageman <[email protected]>
Discussion: https://postgr.es/m/87DD95AA-274F-4F4F-BAD9-7738E5B1F905%40yandex-team.ru
---
 src/backend/access/heap/vacuumlazy.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 932701d8420..8bea0454ff5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2121,8 +2121,10 @@ lazy_scan_prune(LVRelState *vacrel,
 	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
 			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
 	{
-		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
+		ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
+						  errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+								 vacrel->relname, blkno)));
+
 		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
 							VISIBILITYMAP_VALID_BITS);
 	}
@@ -2143,8 +2145,10 @@ lazy_scan_prune(LVRelState *vacrel,
 	 */
 	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
 	{
-		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-			 vacrel->relname, blkno);
+		ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
+						  errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+								 vacrel->relname, blkno)));
+
 		PageClearAllVisible(page);
 		MarkBufferDirty(buf);
 		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-- 
2.43.0



  [text/x-patch] v10-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch (11.3K, 3-v10-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch)
  download | inline diff:
From 2a418d3dee217cbf411ec96a7a6b95831077f887 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v10 03/22] Eliminate xl_heap_visible in COPY FREEZE

Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c        | 47 ++++++++++-------
 src/backend/access/heap/heapam_xlog.c   | 43 +++++++++++++++-
 src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c  |  5 ++
 src/include/access/visibilitymap.h      |  2 +
 5 files changed, 144 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4c5ae205a7a..893a739009a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2504,9 +2504,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * If the page is all visible, need to clear that, unless we're only
 		 * going to add further frozen rows to it.
-		 *
-		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2516,8 +2513,22 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+
+		/*
+		 * If we're only adding already frozen rows to a previously empty
+		 * page, mark it as all-frozen and update the visibility map. We're
+		 * already holding a pin on the vmbuffer.
+		 */
 		else if (all_frozen_set)
+		{
 			PageSetAllVisible(page);
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			visibilitymap_set_vmbits(relation,
+									 BufferGetBlockNumber(buffer),
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN);
+		}
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2576,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2627,7 +2644,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2637,26 +2657,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+			{
+				Assert(BufferIsDirty(vmbuffer));
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
+			}
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
-		 */
 		if (all_frozen_set)
-		{
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..0820f7d052d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -662,6 +663,46 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Now read and update the VM block. Even if we skipped updating the heap
+	 * page due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * It is only okay to set the VM bits without holding the heap page lock
+	 * because we can expect no other writers of this page.
+	 */
+	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		visibilitymap_set_vmbits(reln, blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN);
+
+		/*
+		 * It is not possible that the VM was already set for this heap page,
+		 * so the vmbuffer must have been modified and marked dirty.
+		 */
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..bb8dfd8910a 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
+ *      visibilitymap_set_vmbits - set bit(s) in a pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+		 flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+	/* Call in same critical section where WAL is emitted. */
+	Assert(InRecovery || CritSectionCount > 0);
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	Assert(BufferIsExclusiveLocked(vmBuf));
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..fc7056a91ea 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v10-0001-Remove-unused-xl_heap_prune-member-reason.patch (1.1K, 4-v10-0001-Remove-unused-xl_heap_prune-member-reason.patch)
  download | inline diff:
From 17aaea61d6d2f24d9271b5cd122c7ba5c3a31cdd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Sep 2025 15:02:58 -0400
Subject: [PATCH v10 01/22] Remove unused xl_heap_prune member, reason

f83d709760d8 refactored xl_heap_prune and added an unused member,
reason. While PruneReason is used when constructing this WAL record to
set the WAL record definition, it doesn't need to be stored in a
separate field in the record. Remove it.

Author: Melanie Plageman <[email protected]>
Reported-by: Andres Freund <[email protected]>

Discussion: https://postgr.es/m/tvvtfoxz5ykpsctxjbzxg3nldnzfc7geplrt2z2s54pmgto27y%40hbijsndifu45
---
 src/include/access/heapam_xlog.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..d4c0625b632 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -284,7 +284,6 @@ typedef struct xl_heap_update
  */
 typedef struct xl_heap_prune
 {
-	uint8		reason;
 	uint8		flags;
 
 	/*
-- 
2.43.0



  [text/x-patch] v10-0004-Make-heap_page_is_all_visible-independent-of-LVR.patch (5.4K, 5-v10-0004-Make-heap_page_is_all_visible-independent-of-LVR.patch)
  download | inline diff:
From 1953de8af1b47cd8309859da66e73f3eaeceb878 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v10 04/22] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 48 ++++++++++++++++++----------
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8bea0454ff5..c02eca36c88 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,8 +463,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2009,8 +2012,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2910,8 +2914,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * emitted.
 	 */
 	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
@@ -3594,10 +3598,18 @@ dead_items_cleanup(LVRelState *vacrel)
 }
 
 /*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * Check if every tuple in the given page in buf is visible to all current and
+ * future transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * Sets *all_frozen to true if every tuple on this page is frozen.
+ *
+ * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
+ * It is only valid if the page is all-visible.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
  *
  * This is a stripped down version of lazy_scan_prune().  If you change
  * anything here, make sure that everything stays in sync.  Note that an
@@ -3605,9 +3617,11 @@ dead_items_cleanup(LVRelState *vacrel)
  * introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3630,7 +3644,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3654,9 +3668,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
 										 buf))
 		{
 			case HEAPTUPLE_LIVE:
@@ -3677,7 +3691,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+											   OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3712,7 +3726,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
-- 
2.43.0



  [text/x-patch] v10-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch (28.4K, 6-v10-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch)
  download | inline diff:
From 42341137bce0a86523f864ea57380e0285f18396 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v10 05/22] Eliminate xl_heap_visible from vacuum phase III

Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.

The visibilitymap bits are stored in the flags member of the
xl_heap_prune record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c  | 145 ++++++++++++++++++----
 src/backend/access/heap/pruneheap.c    |  66 ++++++++--
 src/backend/access/heap/vacuumlazy.c   | 164 +++++++++++++++++--------
 src/backend/access/rmgrdesc/heapdesc.c |   7 +-
 src/include/access/heapam.h            |   9 ++
 src/include/access/heapam_xlog.h       |  36 ++++--
 6 files changed, 330 insertions(+), 97 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 0820f7d052d..11c11929ed9 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
+	Size		freespace = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,17 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
+	vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. That means we
+	 * can't replay this record if it removes tuples that are still visible to
+	 * transactions on the standby, freeze tuples with xids that are still
+	 * considered running on the standby, or set a page as all-visible in the
+	 * VM if it isn't all-visible to all transactions on the standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -71,12 +79,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	}
 
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -89,6 +97,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Size		datalen;
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
+		bool		do_prune;
+		bool		mark_buffer_dirty;
+		bool		set_heap_lsn;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +108,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+		set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 ||
+			   vmflags & VISIBILITYMAP_VALID_BITS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,36 +156,117 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		/*
+		 * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+		 * also going to set bits in the VM later.
+		 *
+		 * We must never end up with the VM bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+		 * modification would fail to clear the VM bit.
+		 */
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+
+			/*
+			 * If the only change to the heap page is setting PD_ALL_VISIBLE,
+			 * we can avoid setting the page LSN unless checksums or
+			 * wal_log_hints are enabled.
+			 */
+			set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+			mark_buffer_dirty = true;
+		}
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
 
-		PageSetLSN(page, lsn);
-		MarkBufferDirty(buffer);
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+		if (set_heap_lsn)
+			PageSetLSN(page, lsn);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or will be setting a page in
+	 * the visibility map, measure the page's freespace to later update the
+	 * freespace map.
+	 *
+	 * Even if we are just updating the VM (and thus not freeing up any
+	 * space), we'll still update the FSM for this page. Since FSM is not
+	 * WAL-logged and only updated heuristically, it easily becomes stale in
+	 * standbys.  If the standby is later promoted and runs VACUUM, it will
+	 * skip updating individual free space figures for pages that became
+	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
+	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+	 * space values to upper FSM layers; later inserters try to use such pages
+	 * only to find out that they are unusable.  This can cause long stalls
+	 * when there are many such pages.
+	 *
+	 * Forestall those problems by updating FSM's idea about a page that is
+	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
+	 *
+	 * We want to avoid holding an exclusive lock on the heap buffer while
+	 * doing IO (either of the FSM or the VM), so we'll release the lock on
+	 * the heap buffer before doing either.
 	 */
 	if (BufferIsValid(buffer))
 	{
-		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
-						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
-		{
-			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+		if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+							XLHP_HAS_DEAD_ITEMS |
+							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+			vmflags & VISIBILITYMAP_VALID_BITS)
+			freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+
+		UnlockReleaseBuffer(buffer);
+	}
+
+	/*
+	 * Read and update the VM block. Even if we skipped updating the heap page
+	 * due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * Note that it is _only_ okay that we do not hold a lock on the heap page
+	 * because we are in recovery and can expect no other writers to clear
+	 * PD_ALL_VISIBLE before we are able to update the VM.
+	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		uint8		old_vmbits = 0;
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
-			UnlockReleaseBuffer(buffer);
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
 
-			XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+		{
+			Assert(BufferIsDirty(vmbuffer));
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		}
-		else
-			UnlockReleaseBuffer(buffer);
+
+		FreeFakeRelcacheEntry(reln);
 	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
+	if (freespace > 0)
+		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7ebd22f00a3..f0b33d1b696 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -2030,14 +2032,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phase III, the heap page may be marked
+ *   all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a singel WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2045,12 +2051,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer,
+						  uint8 vmflags,
+						  bool set_pd_all_vis,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
 						  PruneReason reason,
@@ -2062,6 +2079,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2088,21 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
+	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+	xlrec.flags = vmflags;
 
-	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2079,7 +2110,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2168,5 +2203,22 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+	{
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+	}
+
+	/*
+	 * If pruning or freezing tuples or setting the page all-visible when
+	 * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+	 * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+	 * but this is deemed okay for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+	{
+		Assert(BufferIsDirty(buffer));
+		PageSetLSN(BufferGetPage(buffer), recptr);
+	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c02eca36c88..e35cb629261 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2850,8 +2852,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
+	bool		set_pd_all_vis = false;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2862,6 +2867,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+									   vacrel->cutoffs.OldestXmin,
+									   deadoffsets, num_offsets,
+									   &all_frozen, &visibility_cutoff_xid,
+									   &vacrel->offnum))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2881,6 +2900,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
+		Assert(!PageIsAllVisible(page));
+		set_pd_all_vis = true;
+		PageSetAllVisible(page);
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		visibilitymap_set_vmbits(vacrel->rel,
+								 blkno,
+								 vmbuffer, vmflags);
+		conflict_xid = visibility_cutoff_xid;
+	}
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2890,7 +2921,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  vmbuffer,
+								  vmflags,
+								  set_pd_all_vis,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
@@ -2899,39 +2933,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
-	 */
 	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
-	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
-								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  flags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
@@ -3598,40 +3605,85 @@ dead_items_cleanup(LVRelState *vacrel)
 }
 
 /*
- * Check if every tuple in the given page in buf is visible to all current and
- * future transactions.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+
+/*
+ * Determines whether or not the heap page in buf is all-visible other than
+ * the dead line pointers referred to by the provided deadoffsets array.
  *
- * OldestXmin is used to determine visibility.
+ * deadoffsets are the offsets the caller knows about and already removed
+ * associated index entries. Vacuum will call this before setting those line
+ * pointers LP_UNUSED. So, if there are no new LP_DEAD items, then the page
+ * can be set all-visible in the VM by the caller.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
  *
- * Sets *all_frozen to true if every tuple on this page is frozen.
+ * OldestXmin is used to determine visibility.
  *
- * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
- * It is only valid if the page is all-visible.
+ * *all_frozen is an output parameter indicating to the caller if every tuple
+ * on the page is frozen.
  *
  * *logging_offnum will have the OffsetNumber of the current tuple being
  * processed for vacuum's error callback system.
  *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
+ * visible tuples. It is only valid if the page is all-visible.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+							   TransactionId OldestXmin,
+							   OffsetNumber *deadoffsets,
+							   int ndeadoffsets,
+							   bool *all_frozen,
+							   TransactionId *visibility_cutoff_xid,
+							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	int			matched_dead_count = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
 
+	Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+	/* Confirm input deadoffsets[] is strictly sorted */
+	if (ndeadoffsets > 1)
+	{
+		for (int i = 1; i < ndeadoffsets; i++)
+			Assert(deadoffsets[i - 1] < deadoffsets[i]);
+	}
+#endif
+
 	maxoff = PageGetMaxOffsetNumber(page);
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff && all_visible;
@@ -3659,9 +3711,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			if (!deadoffsets ||
+				matched_dead_count >= ndeadoffsets ||
+				deadoffsets[matched_dead_count] != offnum)
+			{
+				*all_frozen = all_visible = false;
+				break;
+			}
+			matched_dead_count++;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..c95d30dfe8d 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
  * code, the latter of which is used in frontend (pg_waldump) code.
  */
 void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 									   int *nplans, xlhp_freeze_plan **plans,
 									   OffsetNumber **frz_offsets,
 									   int *nredirected, OffsetNumber **redirected,
@@ -279,7 +279,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			TransactionId conflict_xid;
 
 			memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
-
 			appendStringInfo(buf, "snapshotConflictHorizon: %u",
 							 conflict_xid);
 		}
@@ -287,6 +286,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 xlrec->flags & VISIBILITYMAP_VALID_BITS);
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
 										   Buffer buffer);
 extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer,
+									  uint8 vmflags,
+									  bool vm_modified_heap_page,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
 									  PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..d8508593e7c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
  * Main data section:
  *
  *	xl_heap_prune
- *		uint8				flags
+ *		uint16				flags
  *	TransactionId			snapshot_conflict_horizon
  *
  * Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
  */
 typedef struct xl_heap_prune
 {
-	uint8		flags;
+	uint16		flags;
 
 	/*
 	 * If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,10 +292,22 @@ typedef struct xl_heap_prune
 	 */
 } xl_heap_prune;
 
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
+
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set. As
+ * such, (1 << 0) and (1 << 1) are reserved for VISIBILITYMAP_ALL_VISIBLE and
+ * VISIBILITYMAP_ALL_FROZEN.
+ */
 
-/* to handle recovery conflict during logical decoding on standby */
-#define		XLHP_IS_CATALOG_REL			(1 << 1)
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
+#define		XLHP_IS_CATALOG_REL			(1 << 2)
 
 /*
  * Does replaying the record require a cleanup-lock?
@@ -305,7 +317,7 @@ typedef struct xl_heap_prune
  * marks LP_DEAD line pointers as unused without moving any tuple data, an
  * ordinary exclusive lock is sufficient.
  */
-#define		XLHP_CLEANUP_LOCK	       (1 << 2)
+#define		XLHP_CLEANUP_LOCK	       (1 << 3)
 
 /*
  * If we remove or freeze any entries that contain xids, we need to include a
@@ -313,22 +325,22 @@ typedef struct xl_heap_prune
  * there are no queries running for which the removed tuples are still
  * visible, or which still consider the frozen XIDs as running.
  */
-#define		XLHP_HAS_CONFLICT_HORIZON   (1 << 3)
+#define		XLHP_HAS_CONFLICT_HORIZON   (1 << 4)
 
 /*
  * Indicates that an xlhp_freeze_plans sub-record and one or more
  * xlhp_freeze_plan sub-records are present.
  */
-#define		XLHP_HAS_FREEZE_PLANS		(1 << 4)
+#define		XLHP_HAS_FREEZE_PLANS		(1 << 5)
 
 /*
  * XLHP_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, and XLHP_HAS_NOW_UNUSED_ITEMS
  * indicate that xlhp_prune_items sub-records with redirected, dead, and
  * unused item offsets are present.
  */
-#define		XLHP_HAS_REDIRECTIONS		(1 << 5)
-#define		XLHP_HAS_DEAD_ITEMS	        (1 << 6)
-#define		XLHP_HAS_NOW_UNUSED_ITEMS   (1 << 7)
+#define		XLHP_HAS_REDIRECTIONS		(1 << 6)
+#define		XLHP_HAS_DEAD_ITEMS	        (1 << 7)
+#define		XLHP_HAS_NOW_UNUSED_ITEMS   (1 << 8)
 
 /*
  * xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
@@ -497,7 +509,7 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
 								   uint8 vmflags);
 
 /* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
 												   OffsetNumber **frz_offsets,
 												   int *nredirected, OffsetNumber **redirected,
-- 
2.43.0



  [text/x-patch] v10-0006-Use-xl_heap_prune-record-for-setting-empty-pages.patch (5.8K, 7-v10-0006-Use-xl_heap_prune-record-for-setting-empty-pages.patch)
  download | inline diff:
From c53c62b18414bb6de25bef4c4e428904828dfa5a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v10 06/22] Use xl_heap_prune record for setting empty pages
 all-visible

As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
 src/backend/access/heap/pruneheap.c  | 14 +++++--
 src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++----------
 src/include/access/heapam.h          |  1 +
 3 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f0b33d1b696..373986b204a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  false,
 									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
@@ -2055,6 +2056,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
  * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
  * the page LSN when checksums/wal_log_hints are enabled even if we did not
  * prune or freeze tuples on the page.
@@ -2065,6 +2069,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  bool force_heap_fpi,
 						  Buffer vmbuffer,
 						  uint8 vmflags,
 						  bool set_pd_all_vis,
@@ -2095,13 +2100,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 
 	regbuf_flags = REGBUF_STANDARD;
 
+	if (force_heap_fpi)
+		regbuf_flags |= REGBUF_FORCE_IMAGE;
+
 	/*
 	 * We can avoid an FPI if the only modification we are making to the heap
 	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+	else if (!do_prune &&
+			 nfrozen == 0 &&
+			 (!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e35cb629261..67c853b586a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1877,33 +1877,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+				VISIBILITYMAP_ALL_FROZEN;
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
-			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				PageGetLSN(page) == InvalidXLogRecPtr)
-				log_newpage_buffer(buf, true);
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			visibilitymap_set_vmbits(vacrel->rel, blkno,
+									 vmbuffer, new_vmbits);
+
+			if (RelationNeedsWAL(vacrel->rel))
+			{
+				/*
+				 * It's possible that another backend has extended the heap,
+				 * initialized the page, and then failed to WAL-log the page
+				 * due to an ERROR.  Since heap extension is not WAL-logged,
+				 * recovery might try to replay our record setting the page
+				 * all-visible and find that the page isn't initialized, which
+				 * will cause a PANIC. To prevent that, if the page hasn't
+				 * been previously WAL-logged, force a heap FPI.
+				 */
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  PageGetLSN(page) == InvalidXLogRecPtr,
+										  vmbuffer,
+										  new_vmbits,
+										  true,
+										  InvalidTransactionId,
+										  false, PRUNE_VACUUM_SCAN,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+			}
 
-			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
-			/* Count the newly all-frozen pages for logging */
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+			/* Count the newly all-frozen pages for logging. */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
 		}
@@ -2921,6 +2935,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
+								  false,
 								  vmbuffer,
 								  vmflags,
 								  set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  bool force_heap_fpi,
 									  Buffer vmbuffer,
 									  uint8 vmflags,
 									  bool vm_modified_heap_page,
-- 
2.43.0



  [text/x-patch] v10-0009-Find-and-fix-VM-corruption-in-heap_page_prune_an.patch (12.0K, 8-v10-0009-Find-and-fix-VM-corruption-in-heap_page_prune_an.patch)
  download | inline diff:
From 76e13c0ec681ec1eaba065bc3e88f72e37b37621 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v10 09/22] Find and fix VM corruption in
 heap_page_prune_and_freeze

Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
 src/backend/access/heap/pruneheap.c  | 91 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 82 +++----------------------
 src/include/access/heapam.h          |  4 ++
 3 files changed, 100 insertions(+), 77 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 373986b204a..e0005c2d4f2 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
+
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+			heap_page_prune_and_freeze(relation, buffer, false,
+									   InvalidBuffer,
+									   vistest, 0,
 									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -294,6 +303,68 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
+						  errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+								 RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
+						  errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+								 RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +385,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
  * (see heap_prune_satisfies_vacuum).
  *
@@ -349,6 +424,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   bool blk_known_av,
+						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
 						   int options,
 						   struct VacuumCutoffs *cutoffs,
@@ -897,6 +974,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	/*
+	 * Clear any VM corruption. This does not need to be done in a critical
+	 * section.
+	 */
+	presult->vm_corruption = false;
+	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+																blockno, buffer, page,
+																blk_known_av,
+																prstate.lpdead_items, vmbuffer);
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c1ae3a355c8..322e54c803f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
-										   BlockNumber heap_blk,
-										   Buffer heap_buffer, Page heap_page,
-										   bool heap_blk_known_av,
-										   int64 nlpdead_items,
-										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1938,70 +1932,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
-							   BlockNumber heap_blk,
-							   Buffer heap_buffer, Page heap_page,
-							   bool heap_blk_known_av,
-							   int64 nlpdead_items,
-							   Buffer vmbuffer)
-{
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
-		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
-						  errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-								 RelationGetRelationName(relation), heap_blk)));
-
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
-						  errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-								 RelationGetRelationName(relation), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	return false;
-}
-
-
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2059,11 +1989,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+	heap_page_prune_and_freeze(rel, buf,
+							   all_visible_according_to_vm,
+							   vmbuffer,
+							   vacrel->vistest, prune_options,
 							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2148,10 +2081,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables. Start by looking for any VM corruption.
+	 * all_frozen variables.
 	 */
-	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
-									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	if (presult.vm_corruption)
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
+	bool		vm_corruption;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   bool blk_known_av,
+									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
 									   int options,
 									   struct VacuumCutoffs *cutoffs,
-- 
2.43.0



  [text/x-patch] v10-0008-Combine-vacuum-phase-I-VM-update-cases.patch (5.8K, 9-v10-0008-Combine-vacuum-phase-I-VM-update-cases.patch)
  download | inline diff:
From 271fa7a624f6fe07ce96dc2a59b3ea5ae8303347 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v10 08/22] Combine vacuum phase I VM update cases

We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.

Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.

The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
 src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
 1 file changed, 32 insertions(+), 69 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2cc11e6d55d..c1ae3a355c8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2155,11 +2155,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
-	else if (!all_visible_according_to_vm && presult.all_visible)
+
+	/*
+	 * If the page isn't yet marked all-visible in the VM or it is and needs
+	 * to me marked all-frozen, update the VM Note that all_frozen is only
+	 * valid if all_visible is true, so we must check both all_visible and
+	 * all_frozen.
+	 */
+	else if (presult.all_visible &&
+			 (!all_visible_according_to_vm ||
+			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
+		/*
+		 * If the page is all-frozen, we can pass InvalidTransactionId as our
+		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+		 * everything safe for REDO was logged when the page's tuples were
+		 * frozen.
+		 */
 		if (presult.all_frozen)
 		{
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2172,21 +2187,29 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * If the heap page is all-visible but the VM bit is not set, we don't
+		 * need to dirty the heap page.  However, if checksums are enabled, we
+		 * do need to make sure that the heap page is dirtied before passing
+		 * it to visibilitymap_set(), because it may be logged.
 		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
+		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
 									   flags);
 
 		/*
+		 * Even if we are only setting the all-frozen bit, there is a small
+		 * chance that the VM was modified sometime between setting
+		 * all_visible_according_to_vm and checking the visibility during
+		 * pruning. Check the return value of old_vmbits to ensure the
+		 * visibility map counters used for logging are accurate.
+		 *
 		 * If the page wasn't already set all-visible and/or all-frozen in the
 		 * VM, count it as newly set for logging.
 		 */
@@ -2207,66 +2230,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v10-0010-Keep-all_frozen-updated-too-in-heap_page_prune_a.patch (3.1K, 10-v10-0010-Keep-all_frozen-updated-too-in-heap_page_prune_a.patch)
  download | inline diff:
From 3ac96aa83bad6be7347b5103fb8b31d42c975f2d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v10 10/22] Keep all_frozen updated too in
 heap_page_prune_and_freeze

We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.

Future commits will separate usage of these fields, so it is best not to
rely on all_visible for all_frozen's validity.
---
 src/backend/access/heap/pruneheap.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e0005c2d4f2..10d030fb3e7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
 	 * whether to freeze the page or not.  The all_visible and all_frozen
 	 * values returned to the caller are adjusted to include LP_DEAD items at
 	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -828,6 +824,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -1472,7 +1469,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1494,7 +1491,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1507,7 +1504,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1526,7 +1523,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1544,7 +1541,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
-- 
2.43.0



  [text/x-patch] v10-0007-Combine-lazy_scan_prune-VM-corruption-cases.patch (7.4K, 11-v10-0007-Combine-lazy_scan_prune-VM-corruption-cases.patch)
  download | inline diff:
From 84457d090e0f176e84f14fadf95b095722d1e767 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v10 07/22] Combine lazy_scan_prune VM corruption cases

lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.

Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.

This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
 src/backend/access/heap/vacuumlazy.c | 122 +++++++++++++++++----------
 1 file changed, 77 insertions(+), 45 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 67c853b586a..2cc11e6d55d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,6 +430,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1932,6 +1938,70 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
+						  errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+								 RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
+						  errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+								 RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
+
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2078,9 +2148,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * all_frozen variables. Start by looking for any VM corruption.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	{
+		/* Don't update the VM if we just cleared corruption in it */
+	}
+	else if (!all_visible_according_to_vm && presult.all_visible)
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2132,49 +2207,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
-						  errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-								 vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
-						  errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-								 vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-- 
2.43.0



  [text/x-patch] v10-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patch (29.0K, 12-v10-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patch)
  download | inline diff:
From a5be384e7806fe72c18a54df1a637cf93d16a0b9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v10 12/22] Eliminate xl_heap_visible from vacuum phase I
 prune/freeze

Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.

This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
 src/backend/access/heap/pruneheap.c  | 456 ++++++++++++++++-----------
 src/backend/access/heap/vacuumlazy.c |  30 --
 src/include/access/heapam.h          |  15 +-
 3 files changed, 279 insertions(+), 222 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9fd25d0d501..a415db2c01e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		freeze;
+
+	/*
+	 * Whether or not to consider updating the VM. There is some bookkeeping
+	 * that must be maintained if we would like to update the VM.
+	 */
+	bool		consider_update_vm;
+
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
 	 *
 	 * These fields are not used by pruning itself for the most part, but are
 	 * used to collect information about what was pruned and what state the
-	 * page is in after pruning, for the benefit of the caller.  They are
-	 * copied to the caller's PruneFreezeResult at the end.
+	 * page is in after pruning to use when updating the visibility map and
+	 * for the benefit of the caller.  They are copied to the caller's
+	 * PruneFreezeResult at the end.
 	 * -------------------------------------------------------
 	 */
 
@@ -138,11 +146,10 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+	 * directly before updating the VM. We ignore LP_DEAD items when deciding
+	 * whether or not to opportunistically freeze and when determining the
+	 * snapshot conflict horizon required when freezing tuples.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -375,12 +382,15 @@ identify_and_fix_vm_corruption(Relation relation,
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
  *
  * blk_known_av is the visibility status of the heap block as of the last call
  * to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -396,6 +406,8 @@ identify_and_fix_vm_corruption(Relation relation,
  *   FREEZE indicates that we will also freeze tuples, and will return
  *   'all_visible', 'all_frozen' flags to the caller.
  *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -440,18 +452,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	HeapTupleData tup;
 	bool		do_freeze;
 	bool		do_prune;
-	bool		do_hint;
+	bool		do_hint_full_or_prunable;
+	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	bool		all_frozen_except_lp_dead = false;
+	bool		set_pd_all_visible = false;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
+	Assert(!prstate.consider_update_vm || vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -496,50 +514,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Keep track of whether or not the page will be all-visible and
+	 * all-frozen for use in opportunistic freezing and to update the VM if
+	 * the caller requests it.
+	 *
+	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
+	 * other callers could do either one. The visibility bookkeeping is
+	 * required for opportunistic freezing (in addition to setting the VM
+	 * bits) because we only consider opportunistically freezing tuples if the
+	 * whole page would become all-frozen or if the whole page will be frozen
+	 * except for dead tuples that will be removed by vacuum.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * If only updating the VM, we must initialize all_frozen to false, as
+	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
+	 * page and we will not end up correctly setting it to false later.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * Dead tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing, so we do not clear
+	 * all_visible when we see LP_DEAD items. We fix that after determining
+	 * whether or not to freeze but before deciding whether or not to update
+	 * the VM so that we don't set the VM bit incorrectly.
+	 *
+	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * bookkeeping. Initializing all_visible to false allows skipping the work
+	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
 	if (prstate.freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if (prstate.consider_update_vm)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon.
+	 * This is most likely to happen when updating the VM and/or freezing all
+	 * live tuples on the page. It is updated before returning to the caller
+	 * because vacuum does assert-build only validation on the page using this
+	 * field.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -737,10 +762,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
+	 * pd_prune_xid field or the page was marked full, we will update those
+	 * hint bits.
 	 */
-	do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_full_or_prunable =
+		((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
 	/*
@@ -788,7 +814,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 						if (XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
 					}
-					else if (do_hint)
+					else if (do_hint_full_or_prunable)
 					{
 						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
@@ -827,11 +853,84 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	all_frozen_except_lp_dead = prstate.all_frozen;
+	if (prstate.lpdead_items > 0)
+	{
+		prstate.all_visible = false;
+		prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
+	 */
+	if (prstate.consider_update_vm)
+	{
+		/*
+		 * Clear any VM corruption. This does not need to be in a critical
+		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+		 * we may mark the heap page buffer dirty here and could end up doing
+		 * so again later. This is not a correctness issue and is in the path
+		 * of VM corruption, so we don't have to worry about the extra
+		 * performance overhead.
+		 */
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av, prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/* Determine if we actually need to set the VM and which bits to set. */
+		else if (prstate.all_visible &&
+				 (!blk_known_av ||
+				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+			if (prstate.all_frozen)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+	}
+
+	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+	/* Save these for the caller in case we later zero out vmflags */
+	presult->new_vmbits = vmflags;
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
-	if (do_hint)
+	if (do_hint_full_or_prunable)
 	{
 		/*
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
@@ -847,15 +946,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageClearFull(page);
 
 		/*
-		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+		 * this is a non-WAL-logged hint.  If we are going to freeze or prune
+		 * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+		 * dirty and emit WAL below.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_prune && !do_freeze && !set_pd_all_visible)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -869,12 +969,48 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (set_pd_all_visible)
+			PageSetAllVisible(page);
+
+		/*
+		 * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+		 * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+		 * appropriate here.
+		 */
+		if (do_prune || do_freeze || set_pd_all_visible)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			Assert(PageIsAllVisible(page));
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+			old_vmbits = visibilitymap_set_vmbits(relation, blockno,
+												  vmbuffer, vmflags);
+
+			if (old_vmbits == vmflags)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				do_set_vm = false;
+				/* 0 out vmflags so we don't emit WAL to update the VM */
+				vmflags = 0;
+			}
+		}
+
+		/*
+		 * It should never be the case that PD_ALL_VISIBLE is not set and the
+		 * VM is set. Or, if it were, we should have caught it earlier when
+		 * finding and fixing VM corruption. So, if we found out the VM was
+		 * already set above, we should have found PD_ALL_VISIBLE set earlier.
+		 */
+		Assert(!set_pd_all_visible || do_set_vm);
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+		 * we were only updating the VM and it turns out it was already set,
+		 * we will have unset do_set_vm earlier. As such, check it again
+		 * before emitting the record.
 		 */
-		if (RelationNeedsWAL(relation))
+		if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -886,35 +1022,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
-			TransactionId conflict_xid;
+			TransactionId conflict_xid = InvalidTransactionId;
+
+			/*
+			 * If we are updating the VM, the conflict horizon is almost
+			 * always the visibility cutoff XID.
+			 *
+			 * Separately, if we are freezing any tuples, as an optimization,
+			 * we can use the visibility_cutoff_xid as the conflict horizon if
+			 * the page will be all-frozen. This is true even if there are
+			 * LP_DEAD line pointers because we ignored those when maintaining
+			 * the visibility_cutoff_xid.
+			 */
+			if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+				conflict_xid = prstate.visibility_cutoff_xid;
 
 			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
+			 * Otherwise, if we are freezing but the page would not be
+			 * all-frozen, we have to use the more pessimistic horizon of
+			 * OldestXmin, which may be newer than the newest tuple we froze.
+			 * We currently don't track the newest tuple we froze.
 			 */
-			if (do_freeze)
+			else if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
+				conflict_xid = prstate.cutoffs->OldestXmin;
+				TransactionIdRetreat(conflict_xid);
 			}
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
+			/*
+			 * If we are removing tuples with a younger xmax than our so far
+			 * calculated conflict_xid, we must use this as our horizon.
+			 */
+			if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
 				conflict_xid = prstate.latest_xid_removed;
 
+			/*
+			 * We can omit the snapshot conflict horizon if we are not pruning
+			 * or freezing any tuples and are setting an already all-visible
+			 * page all-frozen in the VM. In this case, all of the tuples on
+			 * the page must already be visible to all MVCC snapshots on the
+			 * standby.
+			 */
+			if (!do_prune && !do_freeze && do_set_vm &&
+				blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+				conflict_xid = InvalidTransactionId;
+
 			log_heap_prune_and_freeze(relation, buffer,
 									  false,
-									  InvalidBuffer, 0, false,
+									  vmbuffer,
+									  vmflags,
+									  set_pd_all_visible,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -926,124 +1083,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected for updating the visibility map.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
-	presult->hastup = prstate.hastup;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so the VM update record doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * VACUUM will call heap_page_is_all_visible() during the second pass over
+	 * the heap to determine all_visible and all_frozen for the page -- this
+	 * is a specialized version of the logic from this function.  Now that
+	 * we've finished pruning and freezing, make sure that we're in total
+	 * agreement with heap_page_is_all_visible() using an assertion. We will
+	 * have already set the page in the VM, so this assertion will only let
+	 * you know that you've already done something wrong.
 	 */
-	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
 	{
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av,
-										   prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		/*
-		 * If the page isn't yet marked all-visible in the VM or it is and
-		 * needs to me marked all-frozen, update the VM Note that all_frozen
-		 * is only valid if all_visible is true, so we must check both
-		 * all_visible and all_frozen.
-		 */
-		else if (presult->all_visible &&
-				 (!blk_known_av ||
-				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			Assert(prstate.lpdead_items == 0);
-			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		Assert(cutoffs);
 
-			/*
-			 * If the page is all-frozen, we can pass InvalidTransactionId as
-			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
-			 * make everything safe for REDO was logged when the page's tuples
-			 * were frozen.
-			 */
-			if (presult->all_frozen)
-			{
-				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			}
+		Assert(prstate.lpdead_items == 0);
 
-			/*
-			 * It's possible for the VM bit to be clear and the page-level bit
-			 * to be set if checksums are not enabled.
-			 *
-			 * And even if we are just planning to update the frozen bit in
-			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
-			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
-			 * might have become stale.
-			 *
-			 * If the heap page is all-visible but the VM bit is not set, we
-			 * don't need to dirty the heap page.  However, if checksums are
-			 * enabled, we do need to make sure that the heap page is dirtied
-			 * before passing it to visibilitymap_set(), because it may be
-			 * logged.
-			 */
-			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buffer);
-			}
+		if (!heap_page_is_all_visible(relation, buffer,
+									  cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
 
-			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
-										   vmbuffer, presult->vm_conflict_horizon,
-										   vmflags);
-		}
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
+#endif
 
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->old_vmbits = old_vmbits;
+	/* new_vmbits was set above */
+	presult->hastup = prstate.hastup;
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = vmflags;
-
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
@@ -1625,7 +1713,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			break;
 	}
 
-	/* Consider freezing any normal tuples which will not be removed */
+	/*
+	 * Consider freezing any normal tuples which will not be removed.
+	 * Regardless of whether or not we want to freeze the tuples, if we want
+	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+	 * tuple to know whether or not the page will be totally frozen.
+	 */
 	if (prstate->freeze)
 	{
 		bool		totally_frozen;
@@ -2188,7 +2281,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phase III, the heap page may be marked
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
  *   all-visible and all-frozen.
  *
  * These changes all happen together, so we use a singel WAL record for them
@@ -2242,6 +2335,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
 	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
+	Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
 	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
 	xlrec.flags = vmflags;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 05d3d2a3267..75c10ba20c6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2013,34 +2013,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2074,8 +2046,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate the status of the page as reflected
-	 * in the visibility map after pruning, freezing, and setting any pages
-	 * all-visible in the visibility map.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page
-	 * (older than OldestXmin).  It will only be valid if we did not set the
-	 * page all-frozen in the VM.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
 	uint8		old_vmbits;
 	uint8		new_vmbits;
 
-- 
2.43.0



  [text/x-patch] v10-0011-Update-VM-in-pruneheap.c.patch (12.5K, 13-v10-0011-Update-VM-in-pruneheap.c.patch)
  download | inline diff:
From d58838ac3ea61a10b07c792610e6fdd23f5ef487 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v10 11/22] Update VM in pruneheap.c

As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
 src/backend/access/heap/pruneheap.c  | 99 +++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 98 +++++----------------------
 src/include/access/heapam.h          | 15 +++--
 3 files changed, 106 insertions(+), 106 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 10d030fb3e7..9fd25d0d501 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -364,7 +364,8 @@ identify_and_fix_vm_corruption(Relation relation,
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -440,6 +441,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint;
+	uint8		vmflags = 0;
+	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -940,7 +943,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 *
 	 * Now that freezing has been finalized, unset all_visible if there are
 	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
+	 * of the page, as expected for updating the visibility map.
 	 */
 	if (prstate.all_visible && prstate.lpdead_items == 0)
 	{
@@ -956,31 +959,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->hastup = prstate.hastup;
 
 	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so the VM update record doesn't need it.
 	 */
 	if (presult->all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
 		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
 
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
 	/*
-	 * Clear any VM corruption. This does not need to be done in a critical
-	 * section.
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
 	 */
-	presult->vm_corruption = false;
 	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
-		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
-																blockno, buffer, page,
-																blk_known_av,
-																prstate.lpdead_items, vmbuffer);
+	{
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av,
+										   prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/*
+		 * If the page isn't yet marked all-visible in the VM or it is and
+		 * needs to me marked all-frozen, update the VM Note that all_frozen
+		 * is only valid if all_visible is true, so we must check both
+		 * all_visible and all_frozen.
+		 */
+		else if (presult->all_visible &&
+				 (!blk_known_av ||
+				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			Assert(prstate.lpdead_items == 0);
+			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			/*
+			 * If the page is all-frozen, we can pass InvalidTransactionId as
+			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+			 * make everything safe for REDO was logged when the page's tuples
+			 * were frozen.
+			 */
+			if (presult->all_frozen)
+			{
+				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			}
+
+			/*
+			 * It's possible for the VM bit to be clear and the page-level bit
+			 * to be set if checksums are not enabled.
+			 *
+			 * And even if we are just planning to update the frozen bit in
+			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
+			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+			 * might have become stale.
+			 *
+			 * If the heap page is all-visible but the VM bit is not set, we
+			 * don't need to dirty the heap page.  However, if checksums are
+			 * enabled, we do need to make sure that the heap page is dirtied
+			 * before passing it to visibilitymap_set(), because it may be
+			 * logged.
+			 */
+			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+			{
+				PageSetAllVisible(page);
+				MarkBufferDirty(buffer);
+			}
+
+			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+										   vmbuffer, presult->vm_conflict_horizon,
+										   vmflags);
+		}
+	}
+
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
+
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = vmflags;
+
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 322e54c803f..05d3d2a3267 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1947,7 +1947,8 @@ cmpOffsetNumbers(const void *a, const void *b)
  * vmbuffer is the buffer containing the VM block with visibility information
  * for the heap block, blkno. all_visible_according_to_vm is the saved
  * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1976,6 +1977,7 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
+	 * Then, if the page's visibility status has changed, update the VM.
 	 *
 	 * If the relation has no indexes, we can immediately mark would-be dead
 	 * items LP_UNUSED.
@@ -1984,10 +1986,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * presult.ndeleted.  It should not be confused with presult.lpdead_items;
 	 * presult.lpdead_items's final value can be thought of as the number of
 	 * tuples that were deleted from indexes.
-	 *
-	 * We will update the VM after collecting LP_DEAD items and freezing
-	 * tuples. Pruning will have determined whether or not the page is
-	 * all-visible.
 	 */
 	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
@@ -2079,88 +2077,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (presult.vm_corruption)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		/* Don't update the VM if we just cleared corruption in it */
-	}
-
-	/*
-	 * If the page isn't yet marked all-visible in the VM or it is and needs
-	 * to me marked all-frozen, update the VM Note that all_frozen is only
-	 * valid if all_visible is true, so we must check both all_visible and
-	 * all_frozen.
-	 */
-	else if (presult.all_visible &&
-			 (!all_visible_according_to_vm ||
-			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		/*
-		 * If the page is all-frozen, we can pass InvalidTransactionId as our
-		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
-		 * everything safe for REDO was logged when the page's tuples were
-		 * frozen.
-		 */
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * If the heap page is all-visible but the VM bit is not set, we don't
-		 * need to dirty the heap page.  However, if checksums are enabled, we
-		 * do need to make sure that the heap page is dirtied before passing
-		 * it to visibilitymap_set(), because it may be logged.
-		 */
-		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * Even if we are only setting the all-frozen bit, there is a small
-		 * chance that the VM was modified sometime between setting
-		 * all_visible_according_to_vm and checking the visibility during
-		 * pruning. Check the return value of old_vmbits to ensure the
-		 * visibility map counters used for logging are accurate.
-		 *
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
+	}
 
 	return presult.ndeleted;
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
+	 * all_visible and all_frozen indicate the status of the page as reflected
+	 * in the visibility map after pruning, freezing, and setting any pages
+	 * all-visible in the visibility map.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * vm_conflict_horizon is the newest xmin of live tuples on the page
+	 * (older than OldestXmin).  It will only be valid if we did not set the
+	 * page all-frozen in the VM.
 	 *
 	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
-	bool		vm_corruption;
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
-- 
2.43.0



  [text/x-patch] v10-0014-Remove-xl_heap_visible-entirely.patch (24.1K, 14-v10-0014-Remove-xl_heap_visible-entirely.patch)
  download | inline diff:
From edae8164cc6caf87dfeeb620ac0214ddad4e1b83 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v10 14/22] Remove xl_heap_visible entirely

There are now no users of this, so eliminate it entirely.
---
 src/backend/access/common/bufmask.c      |   3 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 152 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  10 +-
 src/backend/access/heap/visibilitymap.c  | 109 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  11 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 32 insertions(+), 364 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 893a739009a..cb16bb0cbbd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
 #include "access/valid.h"
 #include "access/visibilitymap.h"
 #include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
@@ -2523,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			visibilitymap_set_vmbits(relation,
-									 BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set(relation,
+							  BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN);
 		}
 
 		/*
@@ -8798,49 +8799,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
 
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11c11929ed9..ff3ad8b4cd2 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -53,6 +53,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
 	vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
 
 	/*
 	 * After xl_heap_prune is the optional snapshot conflict horizon.
@@ -250,7 +252,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+		old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
 
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
@@ -269,142 +271,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -785,15 +651,14 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(reln, blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN);
-
 		/*
 		 * It is not possible that the VM was already set for this heap page,
 		 * so the vmbuffer must have been modified and marked dirty.
 		 */
+		visibilitymap_set(reln, blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		FreeFakeRelcacheEntry(reln);
@@ -1374,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 96b2d58e40c..3fe9db99c0d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -984,8 +984,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		{
 			Assert(PageIsAllVisible(page));
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			old_vmbits = visibilitymap_set_vmbits(relation, blockno,
-												  vmbuffer, vmflags);
+			old_vmbits = visibilitymap_set(relation, blockno,
+										   vmbuffer, vmflags);
 
 			if (old_vmbits == vmflags)
 			{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 75c10ba20c6..2ff67d77cb4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1886,8 +1886,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 			MarkBufferDirty(buf);
 
 			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-			visibilitymap_set_vmbits(vacrel->rel, blkno,
-									 vmbuffer, new_vmbits);
+			visibilitymap_set(vacrel->rel, blkno,
+							  vmbuffer, new_vmbits);
 
 			if (RelationNeedsWAL(vacrel->rel))
 			{
@@ -2753,9 +2753,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		set_pd_all_vis = true;
 		PageSetAllVisible(page);
 		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
-		visibilitymap_set_vmbits(vacrel->rel,
-								 blkno,
-								 vmbuffer, vmflags);
+		visibilitymap_set(vacrel->rel,
+						  blkno,
+						  vmbuffer, vmflags);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index bb8dfd8910a..f7bad68ffc5 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *      visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set flags in the VM block contained in the passed in vmBuf.
  *
@@ -343,8 +240,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * is pinned and exclusive locked.
  */
 uint8
-visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index c95d30dfe8d..47998f1df15 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -343,13 +343,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -454,9 +447,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d8508593e7c..3672f372aa8 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -446,20 +445,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -503,11 +488,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index fc7056a91ea..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..b4c880c083f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4273,7 +4273,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v10-0013-Rename-PruneState.freeze-to-attempt_freeze.patch (4.1K, 15-v10-0013-Rename-PruneState.freeze-to-attempt_freeze.patch)
  download | inline diff:
From 273f80c9d6ba29b9f689584a48c0e28e65280287 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v10 13/22] Rename PruneState.freeze to attempt_freeze

This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.

Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
 src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a415db2c01e..96b2d58e40c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
 	/* whether or not dead items can be set LP_UNUSED during pruning */
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
-	bool		freeze;
+	bool		attempt_freeze;
 
 	/*
 	 * Whether or not to consider updating the VM. There is some bookkeeping
@@ -456,7 +456,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
-	bool		hint_bit_fpi;
+	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	bool		all_frozen_except_lp_dead = false;
 	bool		set_pd_all_visible = false;
@@ -464,7 +464,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
 	prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
@@ -489,7 +489,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/* initialize page freezing working state */
 	prstate.pagefrz.freeze_required = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
 		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -539,7 +539,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * bookkeeping. Initializing all_visible to false allows skipping the work
 	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -657,7 +657,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
 	 * an FPI to be emitted.
 	 */
-	hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	/*
 	 * Process HOT chains.
@@ -774,7 +774,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * plans we prepared, or not.
 	 */
 	do_freeze = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (prstate.pagefrz.freeze_required)
 		{
@@ -807,7 +807,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				 */
 				if (RelationNeedsWAL(relation))
 				{
-					if (hint_bit_fpi)
+					if (did_tuple_hint_fpi)
 						do_freeze = true;
 					else if (do_prune)
 					{
@@ -1132,7 +1132,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
 		{
@@ -1719,7 +1719,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
 	 * tuple to know whether or not the page will be totally frozen.
 	 */
-	if (prstate->freeze)
+	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
 
-- 
2.43.0



  [text/x-patch] v10-0016-Use-GlobalVisState-to-determine-page-level-visib.patch (10.8K, 16-v10-0016-Use-GlobalVisState-to-determine-page-level-visib.patch)
  download | inline diff:
From c3053237e5ee43c8c2a023b1f6e1a018fe55de2f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v10 16/22] Use GlobalVisState to determine page level
 visibility

During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.

It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.

Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.

Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
 src/backend/access/heap/pruneheap.c         | 48 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 19 ++++----
 src/include/access/heapam.h                 |  4 +-
 4 files changed, 60 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 30736519191..5e8748b15ef 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon, when setting the VM or when
+	 * freezing all the live tuples on the page.
 	 *
 	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
 	 * directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -557,14 +556,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon.
-	 * This is most likely to happen when updating the VM and/or freezing all
-	 * live tuples on the page. It is updated before returning to the caller
-	 * because vacuum does assert-build only validation on the page using this
-	 * field.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState.
+	 *
+	 * If we encounter an uncommitted tuple, this field is unmaintained. If
+	 * the page is being set all-visible or when freezing all live tuples on
+	 * the page, it is used to calculate the snapshot conflict horizon.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -760,6 +757,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -1103,12 +1110,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(cutoffs);
-
 		Assert(prstate.lpdead_items == 0);
 
 		if (!heap_page_is_all_visible(relation, buffer,
-									  cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1633,19 +1638,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2ff67d77cb4..7558ac697f1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2715,7 +2715,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 InvalidOffsetNumber);
 
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3458,13 +3458,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3483,7 +3483,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * *all_frozen is an output parameter indicating to the caller if every tuple
  * on the page is frozen.
@@ -3504,7 +3504,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3576,8 +3576,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		tuple.t_len = ItemIdGetLength(itemid);
 		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
-										 buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+												  buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3596,8 +3596,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin,
-											   OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
 
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v10-0018-Unset-all-visible-sooner-if-not-freezing.patch (2.5K, 17-v10-0018-Unset-all-visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 09df07e1794c06cb435c641ad57250672eb16215 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v10 18/22] Unset all-visible sooner if not freezing

In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.

Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.

Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
 src/backend/access/heap/pruneheap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5e8748b15ef..e6cf4df17c5 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1498,8 +1498,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1757,8 +1760,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v10-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (7.1K, 18-v10-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From 69ff6eecfcab1cedabea8a11f61d2c688f700a61 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v10 15/22] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()

Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 14 +++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 13 ++++++-------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3fe9db99c0d..30736519191 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -578,9 +578,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1177,11 +1177,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..f67f01c17c2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v10-0017-Inline-TransactionIdFollows-Precedes.patch (5.0K, 19-v10-0017-Inline-TransactionIdFollows-Precedes.patch)
  download | inline diff:
From 6d261819f8b35946698c540b87260e3c49883c0d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v10 17/22] Inline TransactionIdFollows/Precedes()

Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/transam/transam.c | 64 -------------------------
 src/include/access/transam.h         | 70 ++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
-	/*
-	 * If either ID is a permanent XID then we can just do unsigned
-	 * comparison.  If both are normal, do a modulo-2^32 comparison.
-	 */
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 < id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 <= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 > id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 >= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff >= 0);
-}
-
 
 /*
  * TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
 } TransamVariablesData;
 
 
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+	/*
+	 * If either ID is a permanent XID then we can just do unsigned
+	 * comparison.  If both are normal, do a modulo-2^32 comparison.
+	 */
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 < id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 <= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 > id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 >= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff >= 0);
+}
+
+
 /* ----------------
  *		extern declarations
  * ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
 extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
-- 
2.43.0



  [text/x-patch] v10-0020-Add-helper-functions-to-heap_page_prune_and_free.patch (18.9K, 20-v10-0020-Add-helper-functions-to-heap_page_prune_and_free.patch)
  download | inline diff:
From a0229c65dc6e59033c65f0a9efaf749a82678551 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v10 20/22] Add helper functions to heap_page_prune_and_freeze

heap_page_prune_and_freeze() has gotten rather long. It has several
stages:

1) setup - where the PruneStateis set up
2) tuple examination -- where tuples and line pointers are examined to
   determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
   heuristics, and state gathered during stage 2 whether or not to
   freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged

This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.

For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
 src/backend/access/heap/pruneheap.c | 471 +++++++++++++++++-----------
 1 file changed, 295 insertions(+), 176 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0fedcad24c9..1a1a551859b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+								  bool do_prune,
+								  bool do_hint_full_or_prunable,
+								  bool did_tuple_hint_fpi,
+								  PruneState *prstate,
+								  bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+									 Buffer buffer, BlockNumber blockno, Page page,
+									 PruneReason reason,
+									 bool do_prune, bool do_freeze,
+									 bool blk_known_av,
+									 PruneState *prstate,
+									 Buffer *vmbuffer, uint8 *vmflags,
+									 bool *set_pd_all_visible);
+
 static bool identify_and_fix_vm_corruption(Relation relation,
 										   BlockNumber heap_blk,
 										   Buffer heap_buffer, Page heap_page,
@@ -380,6 +396,249 @@ identify_and_fix_vm_corruption(Relation relation,
 	return false;
 }
 
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+						 Buffer buffer, BlockNumber blockno, Page page,
+						 PruneReason reason,
+						 bool do_prune, bool do_freeze,
+						 bool blk_known_av,
+						 PruneState *prstate,
+						 Buffer *vmbuffer, uint8 *vmflags,
+						 bool *set_pd_all_visible)
+{
+	bool		do_set_vm = false;
+
+	/*
+	 * If the caller specified not to update the VM, validate everything is in
+	 * the right state and exit.
+	 */
+	if (!prstate->consider_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		/* We don't set only the page level visibility hint */
+		Assert(!(*set_pd_all_visible));
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->consider_update_vm &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate->consider_update_vm = false;
+		prstate->all_visible = prstate->all_frozen = false;
+	}
+
+	Assert(!prstate->all_frozen || prstate->all_visible);
+
+	/*
+	 * Clear any VM corruption. This does not need to be in a critical
+	 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+	 * may mark the heap page buffer dirty here and could end up doing so
+	 * again later. This is not a correctness issue and is in the path of VM
+	 * corruption, so we don't have to worry about the extra performance
+	 * overhead.
+	 */
+	if (identify_and_fix_vm_corruption(relation,
+									   blockno, buffer, page,
+									   blk_known_av, prstate->lpdead_items,
+									   *vmbuffer))
+	{
+		/* If we fix corruption, don't update the VM further */
+	}
+
+	/* Determine if we actually need to set the VM and which bits to set. */
+	else if (prstate->all_visible &&
+			 (!blk_known_av ||
+			  (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+	{
+		*vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+	}
+
+	do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	*set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+	return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool do_prune,
+					  bool do_hint_full_or_prunable,
+					  bool did_tuple_hint_fpi,
+					  PruneState *prstate,
+					  bool *all_frozen_except_lp_dead)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and exit.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		Assert(!(*all_frozen_except_lp_dead));
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_full_or_prunable)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	*all_frozen_except_lp_dead = prstate->all_frozen;
+	if (prstate->lpdead_items > 0)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
+
+	return do_freeze;
+}
+
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page. If the page's visibility status has changed, update it in
@@ -770,20 +1029,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Clear the offset information once we have processed the given page. */
 	*off_loc = InvalidOffsetNumber;
 
-	do_prune = prstate.nredirected > 0 ||
-		prstate.ndead > 0 ||
-		prstate.nunused > 0;
-
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
-	 * all-visible.
+	 * all-visible. This must be done before we decide whether or not to
+	 * opportunistically freeze below because we do not want to
+	 * opportunistically freeze the page if there are live tuples not visible
+	 * to everyone, which would prevent setting the page frozen in the VM.
 	 */
 	if (prstate.all_visible &&
 		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
+	/*
+	 * Now decide based on information collected while examining every tuple
+	 * which actions to take. If there are any prunable tuples, we'll prune
+	 * them. However, we will decide based on options specified by the caller
+	 * and various heuristics whether or not to freeze any tuples and whether
+	 * or not the page should be set all-visible/all-frozen in the VM.
+	 */
+	do_prune = prstate.nredirected > 0 ||
+		prstate.ndead > 0 ||
+		prstate.nunused > 0;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -794,182 +1063,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageIsFull(page);
 
 	/*
-	 * Decide if we want to go ahead with freezing according to the freeze
-	 * plans we prepared, or not.
-	 */
-	do_freeze = false;
-	if (prstate.attempt_freeze)
-	{
-		if (prstate.pagefrz.freeze_required)
-		{
-			/*
-			 * heap_prepare_freeze_tuple indicated that at least one XID/MXID
-			 * from before FreezeLimit/MultiXactCutoff is present.  Must
-			 * freeze to advance relfrozenxid/relminmxid.
-			 */
-			do_freeze = true;
-		}
-		else
-		{
-			/*
-			 * Opportunistically freeze the page if we are generating an FPI
-			 * anyway and if doing so means that we can set the page
-			 * all-frozen afterwards (might not happen until VACUUM's final
-			 * heap pass).
-			 *
-			 * XXX: Previously, we knew if pruning emitted an FPI by checking
-			 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze
-			 * and prune records were combined, this heuristic couldn't be
-			 * used anymore.  The opportunistic freeze heuristic must be
-			 * improved; however, for now, try to approximate the old logic.
-			 */
-			if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
-			{
-				/*
-				 * Freezing would make the page all-frozen.  Have already
-				 * emitted an FPI or will do so anyway?
-				 */
-				if (RelationNeedsWAL(relation))
-				{
-					if (did_tuple_hint_fpi)
-						do_freeze = true;
-					else if (do_prune)
-					{
-						if (XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-					else if (do_hint_full_or_prunable)
-					{
-						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
-	}
-	else if (prstate.nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate.pagefrz.freeze_required);
-
-		prstate.all_frozen = false;
-		prstate.nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page. It needs to reflect the present state of
-	 * the page when using it to determine whether or not to update the VM.
-	 *
-	 * Keep track of whether or not the page was all-frozen except LP_DEAD
-	 * items for the purposes of calculating the snapshot conflict horizon,
-	 * though.
+	 * We must decide whether or not to freeze before deciding if and what to
+	 * set in the VM.
 	 */
-	all_frozen_except_lp_dead = prstate.all_frozen;
-	if (prstate.lpdead_items > 0)
-	{
-		prstate.all_visible = false;
-		prstate.all_frozen = false;
-	}
-
-	/*
-	 * If this is an on-access call and we're not actually pruning, avoid
-	 * setting the visibility map if it would newly dirty the heap page or, if
-	 * the page is already dirty, if doing so would require including a
-	 * full-page image (FPI) of the heap page in the WAL. This situation
-	 * should be rare, as on-access pruning is only attempted when
-	 * pd_prune_xid is valid.
-	 */
-	if (reason == PRUNE_ON_ACCESS &&
-		prstate.consider_update_vm &&
-		prstate.all_visible &&
-		!do_prune && !do_freeze &&
-		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
-	{
-		prstate.consider_update_vm = false;
-		prstate.all_visible = prstate.all_frozen = false;
-	}
-
-	Assert(!prstate.all_frozen || prstate.all_visible);
-
-	/*
-	 * Handle setting visibility map bit based on information from the VM (if
-	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
-	 * call), and from all_visible and all_frozen variables.
-	 */
-	if (prstate.consider_update_vm)
-	{
-		/*
-		 * Clear any VM corruption. This does not need to be in a critical
-		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
-		 * we may mark the heap page buffer dirty here and could end up doing
-		 * so again later. This is not a correctness issue and is in the path
-		 * of VM corruption, so we don't have to worry about the extra
-		 * performance overhead.
-		 */
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av, prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
-
-		/* Determine if we actually need to set the VM and which bits to set. */
-		else if (prstate.all_visible &&
-				 (!blk_known_av ||
-				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
-			if (prstate.all_frozen)
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-	}
-
-	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
-
-	/*
-	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
-	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
-	 * set, we strongly prefer to keep them in sync.
-	 *
-	 * Prior to Postgres 19, it was possible for the page-level bit to be set
-	 * and the VM bit to be clear. This could happen if we crashed after
-	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
-	 */
-	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+	do_freeze = heap_page_will_freeze(relation, buffer,
+									  do_prune,
+									  do_hint_full_or_prunable,
+									  did_tuple_hint_fpi,
+									  &prstate,
+									  &all_frozen_except_lp_dead);
+
+	do_set_vm = heap_page_will_update_vm(relation,
+										 buffer, blockno, page,
+										 reason,
+										 do_prune, do_freeze,
+										 blk_known_av,
+										 &prstate,
+										 &vmbuffer,
+										 &vmflags, &set_pd_all_visible);
 
 	/* Save these for the caller in case we later zero out vmflags */
 	presult->new_vmbits = vmflags;
 
-	/* Any error while applying the changes is critical */
+	/*
+	 * Time to actually make the changes to the page and log them. Any error
+	 * while applying the changes is critical.
+	 */
 	START_CRIT_SECTION();
 
 	if (do_hint_full_or_prunable)
-- 
2.43.0



  [text/x-patch] v10-0019-Allow-on-access-pruning-to-set-pages-all-visible.patch (27.3K, 21-v10-0019-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 560aca329ffd7bfafcda4f73e339a8efb7dc9ae8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v10 19/22] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c              | 15 ++++-
 src/backend/access/heap/heapam_handler.c      | 15 ++++-
 src/backend/access/heap/pruneheap.c           | 67 ++++++++++++++-----
 src/backend/access/index/indexam.c            | 46 +++++++++++++
 src/backend/access/table/tableam.c            | 39 +++++++++--
 src/backend/executor/execMain.c               |  4 ++
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  7 +-
 src/backend/executor/nodeIndexscan.c          | 18 +++--
 src/backend/executor/nodeSeqscan.c            | 24 +++++--
 src/include/access/genam.h                    | 11 +++
 src/include/access/heapam.h                   | 24 ++++++-
 src/include/access/relscan.h                  |  6 ++
 src/include/access/tableam.h                  | 30 ++++++++-
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  4 +-
 16 files changed, 278 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cb16bb0cbbd..d07693b7075 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1247,6 +1250,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1285,6 +1289,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1317,6 +1327,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e6cf4df17c5..0fedcad24c9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			int			options = 0;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				options = HEAP_PAGE_PRUNE_UPDATE_VM;
+			}
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * that during on-access pruning with the current implementation.
 			 */
 			heap_page_prune_and_freeze(relation, buffer, false,
-									   InvalidBuffer,
-									   vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+									   vmbuffer ? *vmbuffer : InvalidBuffer,
+									   vistest, options,
+									   NULL, &presult, PRUNE_ON_ACCESS,
+									   &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -517,12 +529,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * all-frozen for use in opportunistic freezing and to update the VM if
 	 * the caller requests it.
 	 *
-	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
-	 * other callers could do either one. The visibility bookkeeping is
-	 * required for opportunistic freezing (in addition to setting the VM
-	 * bits) because we only consider opportunistically freezing tuples if the
-	 * whole page would become all-frozen or if the whole page will be frozen
-	 * except for dead tuples that will be removed by vacuum.
+	 * Currently, only VACUUM attempts freezing. But other callers could. The
+	 * visibility bookkeeping is required for opportunistic freezing (in
+	 * addition to setting the VM bits) because we only consider
+	 * opportunistically freezing tuples if the whole page would become
+	 * all-frozen or if the whole page will be frozen except for dead tuples
+	 * that will be removed by vacuum. But if consider_update_vm is false,
+	 * we'll not set the VM even if the page is discovered to be all-visible.
+	 *
+	 * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+	 * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+	 * because we will not call heap_prepare_freeze_tuple() on each tuple.
 	 *
 	 * If only updating the VM, we must initialize all_frozen to false, as
 	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -534,7 +551,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * whether or not to freeze but before deciding whether or not to update
 	 * the VM so that we don't set the VM bit incorrectly.
 	 *
-	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * If not freezing and not updating the VM, we avoid the extra
 	 * bookkeeping. Initializing all_visible to false allows skipping the work
 	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
@@ -883,12 +900,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.all_frozen = false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate.consider_update_vm &&
+		prstate.all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate.consider_update_vm = false;
+		prstate.all_visible = prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * Handle setting visibility map bit based on information from the VM (if
+	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+	 * call), and from all_visible and all_frozen variables.
 	 */
 	if (prstate.consider_update_vm)
 	{
@@ -2279,8 +2314,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a singel WAL record for them
  * all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff12e2e1364..2e0474c948a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index b2ce35e2a34..e31c21cf8eb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -881,6 +883,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -918,10 +939,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, struct ScanKeyData *key)
+				   int nkeys, struct ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index de782014b2d..839c1be1d7c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -678,6 +678,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..870f03bdd79 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -10,6 +10,7 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Time::HiRes qw(usleep);
 use Test::More;
+use Time::HiRes qw(usleep);
 
 if ($ENV{enable_injection_points} ne 'yes')
 {
@@ -296,6 +297,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -745,7 +747,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v10-0021-Reorder-heap_page_prune_and_freeze-parameters.patch (5.8K, 22-v10-0021-Reorder-heap_page_prune_and_freeze-parameters.patch)
  download | inline diff:
From 3eb27edab1d7a5dd45e1678a1a8a5150d620706c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v10 21/22] Reorder heap_page_prune_and_freeze parameters

Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
 src/backend/access/heap/pruneheap.c  | 38 ++++++++++++++--------------
 src/backend/access/heap/vacuumlazy.c |  6 ++---
 src/include/access/heapam.h          |  4 +--
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1a1a551859b..4377673e3a4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, false,
+			heap_page_prune_and_freeze(relation, buffer, options, false,
 									   vmbuffer ? *vmbuffer : InvalidBuffer,
-									   vistest, options,
-									   NULL, &presult, PRUNE_ON_ACCESS,
+									   vistest,
+									   NULL, PRUNE_ON_ACCESS, &presult,
 									   &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -649,6 +649,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
+ * options:
+ *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ *   pruning.
+ *
+ *   FREEZE indicates that we will also freeze tuples, and will return
+ *   'all_visible', 'all_frozen' flags to the caller.
+ *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
@@ -667,30 +676,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * contain the required block of the visibility map.
  *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- *   UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
  *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed.  It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -703,13 +703,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   int options,
 						   bool blk_known_av,
 						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
-						   int options,
 						   struct VacuumCutoffs *cutoffs,
-						   PruneFreezeResult *presult,
 						   PruneReason reason,
+						   PruneFreezeResult *presult,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7558ac697f1..99b9cab0974 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1991,11 +1991,11 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf,
+	heap_page_prune_and_freeze(rel, buf, prune_options,
 							   all_visible_according_to_vm,
 							   vmbuffer,
-							   vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+							   vacrel->vistest,
+							   &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
 								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   int options,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
-									   int options,
 									   struct VacuumCutoffs *cutoffs,
-									   PruneFreezeResult *presult,
 									   PruneReason reason,
+									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
-- 
2.43.0



  [text/x-patch] v10-0022-Set-pd_prune_xid-on-insert.patch (6.5K, 23-v10-0022-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From d6ea4c3af1d3b6ce45f0219b6bc233867629b21e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v10 22/22] Set pd_prune_xid on insert

Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.

For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.

Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d07693b7075..02aa2383c50 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2164,15 +2165,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2182,7 +2187,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index ff3ad8b4cd2..e7d7804871b 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -470,6 +470,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -619,9 +625,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-08 16:41  Robert Haas <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Robert Haas @ 2025-09-08 16:41 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Fri, Sep 5, 2025 at 6:20 PM Melanie Plageman
<[email protected]> wrote:
> yikes, you are right about the "reason" member. Attached 0002 removes
> it, and I'll go ahead and fix it in the back branches too.

I think changing this in the back-branches is a super-bad idea. If you
want, you can add a comment in the back-branches saying "oops, we
shipped a field that isn't used for anything", but changing the struct
definition is very likely to make 0 people happy and >0 people
unhappy. On the other hand, changing this in master is a good idea and
you should go ahead and do that before this creates any more
confusion.

-- 
Robert Haas
EDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-08 18:32  Melanie Plageman <[email protected]>
  parent: Robert Haas <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-09-08 18:32 UTC (permalink / raw)
  To: Robert Haas <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Sep 8, 2025 at 12:41 PM Robert Haas <[email protected]> wrote:
>
> On Fri, Sep 5, 2025 at 6:20 PM Melanie Plageman
> <[email protected]> wrote:
> > yikes, you are right about the "reason" member. Attached 0002 removes
> > it, and I'll go ahead and fix it in the back branches too.
>
> I think changing this in the back-branches is a super-bad idea. If you
> want, you can add a comment in the back-branches saying "oops, we
> shipped a field that isn't used for anything", but changing the struct
> definition is very likely to make 0 people happy and >0 people
> unhappy. On the other hand, changing this in master is a good idea and
> you should go ahead and do that before this creates any more
> confusion.

Yes, that makes 100% sense. It should have occurred to me. I've pushed
the commit to master. I didn't put an updated set of patches here in
case someone was already reviewing them, as nothing else has changed.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-08 18:54  Robert Haas <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Robert Haas @ 2025-09-08 18:54 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Sep 8, 2025 at 11:44 AM Melanie Plageman
<[email protected]> wrote:
> I pushed this, so rebased v10 is  attached. I've added one new patch:
> 0002 adds ERRCODE_DATA_CORRUPTED to the existing log messages about
> VM/data corruption in vacuum. Andrey Borodin earlier suggested this,
> and I had neglected to include it.

Writing "ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED)" is very
much a minority position. Generally the call to errcode() is on the
following line. I think the commit message could use a bit of work,
too. The first sentence heavily duplicates the second and the fourth,
and the third sentence isn't sufficiently well-connected to the rest
to make it clear why you're restating this general principle in this
commit message.

Perhaps something like:

Add error codes when VACUUM discovers VM corruption

Commit fd6ec93bf890314ac694dc8a7f3c45702ecc1bbd and other previous
work has established the principle that when an error is potentially
reachable in case of on-disk corruption, but is not expected to be
reached otherwise, ERRCODE_DATA_CORRUPTED should be used. This allows
log monitoring software to search for evidence of corruption by
filtering on the error code.

That kibitzing aside, I think this is pretty clearly the right thing to do.

--
Robert Haas
EDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-08 19:14  Melanie Plageman <[email protected]>
  parent: Robert Haas <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-09-08 19:14 UTC (permalink / raw)
  To: Robert Haas <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Sep 8, 2025 at 2:54 PM Robert Haas <[email protected]> wrote:
>
> Commit fd6ec93bf890314ac694dc8a7f3c45702ecc1bbd and other previous
> work has established the principle that when an error is potentially
> reachable in case of on-disk corruption, but is not expected to be
> reached otherwise, ERRCODE_DATA_CORRUPTED should be used. This allows
> log monitoring software to search for evidence of corruption by
> filtering on the error code.
>
> That kibitzing aside, I think this is pretty clearly the right thing to do.

Thanks for the suggested wording and the pointer to that thread.

I noticed that in that thread they decided to use errmsg_internal()
instead of errmsg() for a few different reasons -- one of which was
that the situation is not supposed to happen/cannot happen -- which I
don't really understand. It is a reachable code path. Another is that
it is extra work for translators, which I'm also not sure how to apply
to my situation. Are the VM corruption cases worth extra work to the
translators?

I think the most compelling reason is that people will want to search
for the error message in English online. So, for that reason, my
instinct is to use errmsg_internal() in my case as well.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-08 19:53  Robert Haas <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Robert Haas @ 2025-09-08 19:53 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Sep 8, 2025 at 3:14 PM Melanie Plageman
<[email protected]> wrote:
> I noticed that in that thread they decided to use errmsg_internal()
> instead of errmsg() for a few different reasons -- one of which was
> that the situation is not supposed to happen/cannot happen -- which I
> don't really understand. It is a reachable code path. Another is that
> it is extra work for translators, which I'm also not sure how to apply
> to my situation. Are the VM corruption cases worth extra work to the
> translators?
>
> I think the most compelling reason is that people will want to search
> for the error message in English online. So, for that reason, my
> instinct is to use errmsg_internal() in my case as well.

I don't find that reason particularly compelling -- people could want
to search for any error message, or they could equally want to be able
to read it without Google translate. Guessing which messages are
obscure enough that we need not translate them exceeds my powers. If I
were doing it, I'd make it errmsg() rather than errmsg_internal() and
let the translations team change it if they don't think it's worth
bothering with, because if you make it errmsg_internal() then they
won't see it until somebody complains about it not getting translated.
However, I suspect different committers would pursue different
strategies here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-08 20:14  Robert Haas <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Robert Haas @ 2025-09-08 20:14 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Reviewing 0003:

+               /*
+                * If we're only adding already frozen rows to a
previously empty
+                * page, mark it as all-frozen and update the
visibility map. We're
+                * already holding a pin on the vmbuffer.
+                */
                else if (all_frozen_set)
+               {
                        PageSetAllVisible(page);
+                       LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+                       visibilitymap_set_vmbits(relation,
+
  BufferGetBlockNumber(buffer),
+
  vmbuffer,
+
  VISIBILITYMAP_ALL_VISIBLE |
+
  VISIBILITYMAP_ALL_FROZEN);

Locking a buffer in a critical section violates the order of
operations proposed in the 'Write-Ahead Log Coding' section of
src/backend/access/transam/README.

+        * Now read and update the VM block. Even if we skipped
updating the heap
+        * page due to the file being dropped or truncated later in
recovery, it's
+        * still safe to update the visibility map.  Any WAL record that clears
+        * the visibility map bit does so before checking the page LSN, so any
+        * bits that need to be cleared will still be cleared.
+        *
+        * It is only okay to set the VM bits without holding the heap page lock
+        * because we can expect no other writers of this page.

The first paragraph of this paraphrases a similar content in
xlog_heap_visible(), but I don't see the variation in phrasing as an
improvement.

The second paragraph does not convince me at all. I see no reason to
believe that this is safe, or that it is a good idea. The code in
xlog_heap_visible() thinks its OK to unlock and relock the page to
make visibilitymap_set() happy, which is cringy but probably safe for
lack of concurrent writers, but skipping locking altogether seems
deeply unwise.

- *             visibilitymap_set        - set a bit in a previously pinned page
+ *             visibilitymap_set        - set bit(s) in a previously
pinned page and log
+ *      visibilitymap_set_vmbits - set bit(s) in a pinned page

I suspect the indentation was done with a different mix of spaces and
tabs here, because this doesn't align for me.

In general, this idea makes some sense to me -- there doesn't seem to
be any particularly good reason why the visibility-map update should
be handled by a different WAL record than the all-visible flag on the
page itself. It's a little hard for me to make that statement too
conclusively without studying more of the patches than I've had time
to do today, but off the top of my head it seems to make sense.
However, I'm not sure you've taken enough care with the details here.

--
Robert Haas
EDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-08 22:28  Melanie Plageman <[email protected]>
  parent: Robert Haas <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-09-08 22:28 UTC (permalink / raw)
  To: Robert Haas <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Sep 8, 2025 at 4:15 PM Robert Haas <[email protected]> wrote:
>
> Reviewing 0003:
>
> Locking a buffer in a critical section violates the order of
> operations proposed in the 'Write-Ahead Log Coding' section of
> src/backend/access/transam/README.

Right, I noticed some other callers of visibiltymap_set() (like
lazy_scan_new_or_empty()) did call it in a critical section (and it
exclusive locks the VM page), so I thought perhaps it was better to
keep this operation as close as possible to where we update the VM
(similar to how it is in master in visibilitymap_set()).

But, I think you're right that maintaining the order of operations
proposed in transam/README is more important. As such, in attached
v11, I've modified this patch and the other patches where I replace
visibilitymap_set() with visibilitymap_set_vmbits() to exclusively
lock the vmbuffer before the critical section.
visibilitymap_set_vmbits() asserts that we have the vmbuffer
exclusively locked, so we should be good.

> +        * Now read and update the VM block. Even if we skipped
> updating the heap
> +        * page due to the file being dropped or truncated later in
> recovery, it's
> +        * still safe to update the visibility map.  Any WAL record that clears
> +        * the visibility map bit does so before checking the page LSN, so any
> +        * bits that need to be cleared will still be cleared.
> +        *
> +        * It is only okay to set the VM bits without holding the heap page lock
> +        * because we can expect no other writers of this page.
>
> The first paragraph of this paraphrases a similar content in
> xlog_heap_visible(), but I don't see the variation in phrasing as an
> improvement.

The only difference is I replaced the phrase "LSN interlock" with
"being dropped or truncated later in recovery" -- which is more
specific and, I thought, more clear. Without this comment, it took me
some time to understand the scenarios that might lead us to skip
updating the heap block. heap_xlog_visible() has cause to describe
this situation in an earlier comment -- which is why I think the LSN
interlock comment is less confusing there.

Anyway, I'm open to changing the comment. I could:
1) copy-paste the same comment as heap_xlog_visible()
2) refer to the comment in heap_xlog_visible() (comment seemed a bit
short for that)
3) diverge the comments further by improving the new comment in
heap_xlog_multi_insert() in some way
4) something else?

> The second paragraph does not convince me at all. I see no reason to
> believe that this is safe, or that it is a good idea. The code in
> xlog_heap_visible() thinks its OK to unlock and relock the page to
> make visibilitymap_set() happy, which is cringy but probably safe for
> lack of concurrent writers, but skipping locking altogether seems
> deeply unwise.

Actually in master, heap_xlog_visible() has no lock on the heap page
when it calls visibiltymap_set(). It releases that lock before
recording the freespace in the FSM and doesn't take it again.

It does unlock and relock the VM page -- because visibilitymap_set()
expects to take the lock on the VM.

I agree that not holding the heap lock while updating the VM is
unsatisfying. We can't hold it while doing the IO to read in the VM
block in XLogReadBufferForRedoExtended(). So, we could take it again
before calling visibilitymap_set(). But we don't always have the heap
buffer, though. I suspect this is partially why heap_xlog_visible()
unconditionally passes InvalidBuffer to visibilitymap_set() as the
heap buffer and has special case handling for recovery when we don't
have the heap buffer.

In any case, it isn't an active bug, and I don't think future-proofing
VM replay (i.e. against parallel recovery) is a prerequisite for
committing this patch since it is also that way on master.

> - *             visibilitymap_set        - set a bit in a previously pinned page
> + *             visibilitymap_set        - set bit(s) in a previously
> pinned page and log
> + *      visibilitymap_set_vmbits - set bit(s) in a pinned page
>
> I suspect the indentation was done with a different mix of spaces and
> tabs here, because this doesn't align for me.

oops, fixed.

I pushed the ERRCODE_DATA_CORRUPTED patch, so attached v11 is rebased
and also has the changes mentioned above.

Since you've started reviewing the set, I'll note that patches
0005-0011 are split up for ease of review and it may not necessarily
make sense to keep that separation for eventual commit. They are a
series of steps to move VM updates from lazy_scan_prune() into
pruneheap.c.

- Melanie


Attachments:

  [text/x-patch] v11-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch (11.8K, 2-v11-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch)
  download | inline diff:
From 1ce37296b97bb40e717b3dc1f2052da0b022fa78 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v11 01/20] Eliminate xl_heap_visible in COPY FREEZE

Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.

This halves the number of WAL records emitted by COPY FREEZE.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c        | 50 +++++++++++-------
 src/backend/access/heap/heapam_xlog.c   | 43 +++++++++++++++-
 src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c  |  5 ++
 src/include/access/visibilitymap.h      |  2 +
 5 files changed, 147 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4c5ae205a7a..cff531a4801 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
 
 		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+		{
 			all_frozen_set = true;
+			/* Lock the vmbuffer before entering the critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		}
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
@@ -2504,9 +2508,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * If the page is all visible, need to clear that, unless we're only
 		 * going to add further frozen rows to it.
-		 *
-		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2516,8 +2517,21 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+
+		/*
+		 * If we're only adding already frozen rows to a previously empty
+		 * page, mark it as all-frozen and update the visibility map. We're
+		 * already holding a pin on the vmbuffer.
+		 */
 		else if (all_frozen_set)
+		{
 			PageSetAllVisible(page);
+			visibilitymap_set_vmbits(relation,
+									 BufferGetBlockNumber(buffer),
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN);
+		}
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2579,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2627,7 +2647,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2637,26 +2660,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+			{
+				Assert(BufferIsDirty(vmbuffer));
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
+			}
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
-		 */
 		if (all_frozen_set)
-		{
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..0820f7d052d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -662,6 +663,46 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Now read and update the VM block. Even if we skipped updating the heap
+	 * page due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * It is only okay to set the VM bits without holding the heap page lock
+	 * because we can expect no other writers of this page.
+	 */
+	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		visibilitymap_set_vmbits(reln, blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN);
+
+		/*
+		 * It is not possible that the VM was already set for this heap page,
+		 * so the vmbuffer must have been modified and marked dirty.
+		 */
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..aa48a436108 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
+ *		visibilitymap_set_vmbits - set bit(s) in a pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+		 flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+	/* Call in same critical section where WAL is emitted. */
+	Assert(InRecovery || CritSectionCount > 0);
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	Assert(BufferIsExclusiveLocked(vmBuf));
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..fc7056a91ea 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v11-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch (28.3K, 3-v11-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch)
  download | inline diff:
From fff425a8f480f66dc61c61ffb2b15f679901331d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v11 03/20] Eliminate xl_heap_visible from vacuum phase III

Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.

The visibilitymap bits are stored in the flags member of the
xl_heap_prune record.

This can decrease the number of of WAL records vacuum phase III emits by
as much as half.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c  | 145 +++++++++++++++++----
 src/backend/access/heap/pruneheap.c    |  66 ++++++++--
 src/backend/access/heap/vacuumlazy.c   | 166 +++++++++++++++++--------
 src/backend/access/rmgrdesc/heapdesc.c |   6 +-
 src/include/access/heapam.h            |   9 ++
 src/include/access/heapam_xlog.h       |  36 ++++--
 6 files changed, 332 insertions(+), 96 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 0820f7d052d..11c11929ed9 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
+	Size		freespace = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,17 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
+	vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. That means we
+	 * can't replay this record if it removes tuples that are still visible to
+	 * transactions on the standby, freeze tuples with xids that are still
+	 * considered running on the standby, or set a page as all-visible in the
+	 * VM if it isn't all-visible to all transactions on the standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -71,12 +79,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	}
 
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -89,6 +97,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Size		datalen;
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
+		bool		do_prune;
+		bool		mark_buffer_dirty;
+		bool		set_heap_lsn;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +108,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+		set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 ||
+			   vmflags & VISIBILITYMAP_VALID_BITS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,36 +156,117 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		/*
+		 * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+		 * also going to set bits in the VM later.
+		 *
+		 * We must never end up with the VM bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+		 * modification would fail to clear the VM bit.
+		 */
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+
+			/*
+			 * If the only change to the heap page is setting PD_ALL_VISIBLE,
+			 * we can avoid setting the page LSN unless checksums or
+			 * wal_log_hints are enabled.
+			 */
+			set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+			mark_buffer_dirty = true;
+		}
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
 
-		PageSetLSN(page, lsn);
-		MarkBufferDirty(buffer);
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+		if (set_heap_lsn)
+			PageSetLSN(page, lsn);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or will be setting a page in
+	 * the visibility map, measure the page's freespace to later update the
+	 * freespace map.
+	 *
+	 * Even if we are just updating the VM (and thus not freeing up any
+	 * space), we'll still update the FSM for this page. Since FSM is not
+	 * WAL-logged and only updated heuristically, it easily becomes stale in
+	 * standbys.  If the standby is later promoted and runs VACUUM, it will
+	 * skip updating individual free space figures for pages that became
+	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
+	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+	 * space values to upper FSM layers; later inserters try to use such pages
+	 * only to find out that they are unusable.  This can cause long stalls
+	 * when there are many such pages.
+	 *
+	 * Forestall those problems by updating FSM's idea about a page that is
+	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
+	 *
+	 * We want to avoid holding an exclusive lock on the heap buffer while
+	 * doing IO (either of the FSM or the VM), so we'll release the lock on
+	 * the heap buffer before doing either.
 	 */
 	if (BufferIsValid(buffer))
 	{
-		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
-						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
-		{
-			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+		if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+							XLHP_HAS_DEAD_ITEMS |
+							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+			vmflags & VISIBILITYMAP_VALID_BITS)
+			freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+
+		UnlockReleaseBuffer(buffer);
+	}
+
+	/*
+	 * Read and update the VM block. Even if we skipped updating the heap page
+	 * due to the file being dropped or truncated later in recovery, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * Note that it is _only_ okay that we do not hold a lock on the heap page
+	 * because we are in recovery and can expect no other writers to clear
+	 * PD_ALL_VISIBLE before we are able to update the VM.
+	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		uint8		old_vmbits = 0;
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
-			UnlockReleaseBuffer(buffer);
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
 
-			XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+		{
+			Assert(BufferIsDirty(vmbuffer));
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		}
-		else
-			UnlockReleaseBuffer(buffer);
+
+		FreeFakeRelcacheEntry(reln);
 	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
+	if (freespace > 0)
+		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7ebd22f00a3..f0b33d1b696 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -2030,14 +2032,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phase III, the heap page may be marked
+ *   all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a singel WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2045,12 +2051,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer,
+						  uint8 vmflags,
+						  bool set_pd_all_vis,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
 						  PruneReason reason,
@@ -2062,6 +2079,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2088,21 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
+	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+	xlrec.flags = vmflags;
 
-	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2079,7 +2110,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2168,5 +2203,22 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+	{
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+	}
+
+	/*
+	 * If pruning or freezing tuples or setting the page all-visible when
+	 * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+	 * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+	 * but this is deemed okay for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+	{
+		Assert(BufferIsDirty(buffer));
+		PageSetLSN(BufferGetPage(buffer), recptr);
+	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a84bdfe0a9..51067264004 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2852,8 +2854,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
+	bool		set_pd_all_vis = false;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2864,6 +2869,23 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+									   vacrel->cutoffs.OldestXmin,
+									   deadoffsets, num_offsets,
+									   &all_frozen, &visibility_cutoff_xid,
+									   &vacrel->offnum))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+
+		/* Take the lock on the vmbuffer before entering a critical section */
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2883,6 +2905,17 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
+		Assert(!PageIsAllVisible(page));
+		set_pd_all_vis = true;
+		PageSetAllVisible(page);
+		visibilitymap_set_vmbits(vacrel->rel,
+								 blkno,
+								 vmbuffer, vmflags);
+		conflict_xid = visibility_cutoff_xid;
+	}
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2892,7 +2925,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  vmbuffer,
+								  vmflags,
+								  set_pd_all_vis,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
@@ -2901,39 +2937,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
-	 */
 	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
-	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
-								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  flags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
@@ -3600,40 +3609,85 @@ dead_items_cleanup(LVRelState *vacrel)
 }
 
 /*
- * Check if every tuple in the given page in buf is visible to all current and
- * future transactions.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+
+/*
+ * Determines whether or not the heap page in buf is all-visible other than
+ * the dead line pointers referred to by the provided deadoffsets array.
  *
- * OldestXmin is used to determine visibility.
+ * deadoffsets are the offsets the caller knows about and already removed
+ * associated index entries. Vacuum will call this before setting those line
+ * pointers LP_UNUSED. So, if there are no new LP_DEAD items, then the page
+ * can be set all-visible in the VM by the caller.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
  *
- * Sets *all_frozen to true if every tuple on this page is frozen.
+ * OldestXmin is used to determine visibility.
  *
- * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
- * It is only valid if the page is all-visible.
+ * *all_frozen is an output parameter indicating to the caller if every tuple
+ * on the page is frozen.
  *
  * *logging_offnum will have the OffsetNumber of the current tuple being
  * processed for vacuum's error callback system.
  *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
+ * visible tuples. It is only valid if the page is all-visible.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+							   TransactionId OldestXmin,
+							   OffsetNumber *deadoffsets,
+							   int ndeadoffsets,
+							   bool *all_frozen,
+							   TransactionId *visibility_cutoff_xid,
+							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	int			matched_dead_count = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
 
+	Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+	/* Confirm input deadoffsets[] is strictly sorted */
+	if (ndeadoffsets > 1)
+	{
+		for (int i = 1; i < ndeadoffsets; i++)
+			Assert(deadoffsets[i - 1] < deadoffsets[i]);
+	}
+#endif
+
 	maxoff = PageGetMaxOffsetNumber(page);
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff && all_visible;
@@ -3661,9 +3715,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			if (!deadoffsets ||
+				matched_dead_count >= ndeadoffsets ||
+				deadoffsets[matched_dead_count] != offnum)
+			{
+				*all_frozen = all_visible = false;
+				break;
+			}
+			matched_dead_count++;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..439f33b8061 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
  * code, the latter of which is used in frontend (pg_waldump) code.
  */
 void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 									   int *nplans, xlhp_freeze_plan **plans,
 									   OffsetNumber **frz_offsets,
 									   int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 xlrec->flags & VISIBILITYMAP_VALID_BITS);
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
 										   Buffer buffer);
 extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer,
+									  uint8 vmflags,
+									  bool vm_modified_heap_page,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
 									  PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..d8508593e7c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
  * Main data section:
  *
  *	xl_heap_prune
- *		uint8				flags
+ *		uint16				flags
  *	TransactionId			snapshot_conflict_horizon
  *
  * Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
  */
 typedef struct xl_heap_prune
 {
-	uint8		flags;
+	uint16		flags;
 
 	/*
 	 * If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,10 +292,22 @@ typedef struct xl_heap_prune
 	 */
 } xl_heap_prune;
 
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
+
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set. As
+ * such, (1 << 0) and (1 << 1) are reserved for VISIBILITYMAP_ALL_VISIBLE and
+ * VISIBILITYMAP_ALL_FROZEN.
+ */
 
-/* to handle recovery conflict during logical decoding on standby */
-#define		XLHP_IS_CATALOG_REL			(1 << 1)
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
+#define		XLHP_IS_CATALOG_REL			(1 << 2)
 
 /*
  * Does replaying the record require a cleanup-lock?
@@ -305,7 +317,7 @@ typedef struct xl_heap_prune
  * marks LP_DEAD line pointers as unused without moving any tuple data, an
  * ordinary exclusive lock is sufficient.
  */
-#define		XLHP_CLEANUP_LOCK	       (1 << 2)
+#define		XLHP_CLEANUP_LOCK	       (1 << 3)
 
 /*
  * If we remove or freeze any entries that contain xids, we need to include a
@@ -313,22 +325,22 @@ typedef struct xl_heap_prune
  * there are no queries running for which the removed tuples are still
  * visible, or which still consider the frozen XIDs as running.
  */
-#define		XLHP_HAS_CONFLICT_HORIZON   (1 << 3)
+#define		XLHP_HAS_CONFLICT_HORIZON   (1 << 4)
 
 /*
  * Indicates that an xlhp_freeze_plans sub-record and one or more
  * xlhp_freeze_plan sub-records are present.
  */
-#define		XLHP_HAS_FREEZE_PLANS		(1 << 4)
+#define		XLHP_HAS_FREEZE_PLANS		(1 << 5)
 
 /*
  * XLHP_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, and XLHP_HAS_NOW_UNUSED_ITEMS
  * indicate that xlhp_prune_items sub-records with redirected, dead, and
  * unused item offsets are present.
  */
-#define		XLHP_HAS_REDIRECTIONS		(1 << 5)
-#define		XLHP_HAS_DEAD_ITEMS	        (1 << 6)
-#define		XLHP_HAS_NOW_UNUSED_ITEMS   (1 << 7)
+#define		XLHP_HAS_REDIRECTIONS		(1 << 6)
+#define		XLHP_HAS_DEAD_ITEMS	        (1 << 7)
+#define		XLHP_HAS_NOW_UNUSED_ITEMS   (1 << 8)
 
 /*
  * xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
@@ -497,7 +509,7 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
 								   uint8 vmflags);
 
 /* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
 												   OffsetNumber **frz_offsets,
 												   int *nredirected, OffsetNumber **redirected,
-- 
2.43.0



  [text/x-patch] v11-0002-Make-heap_page_is_all_visible-independent-of-LVR.patch (5.4K, 4-v11-0002-Make-heap_page_is_all_visible-independent-of-LVR.patch)
  download | inline diff:
From 94de583f0c7786ed49d27685055a1e3bd0cecb61 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v11 02/20] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 48 ++++++++++++++++++----------
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 981d9380a92..8a84bdfe0a9 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,8 +463,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2009,8 +2012,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2912,8 +2916,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * emitted.
 	 */
 	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
@@ -3596,10 +3600,18 @@ dead_items_cleanup(LVRelState *vacrel)
 }
 
 /*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * Check if every tuple in the given page in buf is visible to all current and
+ * future transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * Sets *all_frozen to true if every tuple on this page is frozen.
+ *
+ * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
+ * It is only valid if the page is all-visible.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
  *
  * This is a stripped down version of lazy_scan_prune().  If you change
  * anything here, make sure that everything stays in sync.  Note that an
@@ -3607,9 +3619,11 @@ dead_items_cleanup(LVRelState *vacrel)
  * introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3632,7 +3646,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3656,9 +3670,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
 										 buf))
 		{
 			case HEAPTUPLE_LIVE:
@@ -3679,7 +3693,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+											   OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3714,7 +3728,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
-- 
2.43.0



  [text/x-patch] v11-0005-Combine-lazy_scan_prune-VM-corruption-cases.patch (7.4K, 5-v11-0005-Combine-lazy_scan_prune-VM-corruption-cases.patch)
  download | inline diff:
From 6ed211a90a2e3b384ba06d85ae183f513ca3ffc3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v11 05/20] Combine lazy_scan_prune VM corruption cases

lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.

Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.

This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
 src/backend/access/heap/vacuumlazy.c | 126 +++++++++++++++++----------
 1 file changed, 79 insertions(+), 47 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a1cdaaebb57..e9b4e924d22 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,6 +430,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1934,6 +1940,72 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
+
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2080,9 +2152,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * all_frozen variables. Start by looking for any VM corruption.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	{
+		/* Don't update the VM if we just cleared corruption in it */
+	}
+	else if (!all_visible_according_to_vm && presult.all_visible)
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2134,51 +2211,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-- 
2.43.0



  [text/x-patch] v11-0004-Use-xl_heap_prune-record-for-setting-empty-pages.patch (5.9K, 6-v11-0004-Use-xl_heap_prune-record-for-setting-empty-pages.patch)
  download | inline diff:
From 372ba8cb2a0f1b234db1b2dca929ae025d43a034 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v11 04/20] Use xl_heap_prune record for setting empty pages
 all-visible

As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
 src/backend/access/heap/pruneheap.c  | 14 +++++--
 src/backend/access/heap/vacuumlazy.c | 57 ++++++++++++++++++----------
 src/include/access/heapam.h          |  1 +
 3 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f0b33d1b696..373986b204a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  false,
 									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
@@ -2055,6 +2056,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
  * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
  * the page LSN when checksums/wal_log_hints are enabled even if we did not
  * prune or freeze tuples on the page.
@@ -2065,6 +2069,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  bool force_heap_fpi,
 						  Buffer vmbuffer,
 						  uint8 vmflags,
 						  bool set_pd_all_vis,
@@ -2095,13 +2100,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 
 	regbuf_flags = REGBUF_STANDARD;
 
+	if (force_heap_fpi)
+		regbuf_flags |= REGBUF_FORCE_IMAGE;
+
 	/*
 	 * We can avoid an FPI if the only modification we are making to the heap
 	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+	else if (!do_prune &&
+			 nfrozen == 0 &&
+			 (!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 51067264004..a1cdaaebb57 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1877,33 +1877,49 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+				VISIBILITYMAP_ALL_FROZEN;
+
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
-			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				PageGetLSN(page) == InvalidXLogRecPtr)
-				log_newpage_buffer(buf, true);
+			visibilitymap_set_vmbits(vacrel->rel, blkno,
+									 vmbuffer, new_vmbits);
+
+			if (RelationNeedsWAL(vacrel->rel))
+			{
+				/*
+				 * It's possible that another backend has extended the heap,
+				 * initialized the page, and then failed to WAL-log the page
+				 * due to an ERROR.  Since heap extension is not WAL-logged,
+				 * recovery might try to replay our record setting the page
+				 * all-visible and find that the page isn't initialized, which
+				 * will cause a PANIC. To prevent that, if the page hasn't
+				 * been previously WAL-logged, force a heap FPI.
+				 */
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  PageGetLSN(page) == InvalidXLogRecPtr,
+										  vmbuffer,
+										  new_vmbits,
+										  true,
+										  InvalidTransactionId,
+										  false, PRUNE_VACUUM_SCAN,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+			}
 
-			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
-			/* Count the newly all-frozen pages for logging */
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+			/* Count the newly all-frozen pages for logging. */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
 		}
@@ -2925,6 +2941,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
+								  false,
 								  vmbuffer,
 								  vmflags,
 								  set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  bool force_heap_fpi,
 									  Buffer vmbuffer,
 									  uint8 vmflags,
 									  bool vm_modified_heap_page,
-- 
2.43.0



  [text/x-patch] v11-0008-Keep-all_frozen-updated-too-in-heap_page_prune_a.patch (3.1K, 7-v11-0008-Keep-all_frozen-updated-too-in-heap_page_prune_a.patch)
  download | inline diff:
From 76c45ff7622f5c7859ea09eb65c1c552ab6b3ec1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v11 08/20] Keep all_frozen updated too in
 heap_page_prune_and_freeze

We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.

Future commits will separate usage of these fields, so it is best not to
rely on all_visible for all_frozen's validity.
---
 src/backend/access/heap/pruneheap.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 54af3296b91..bbd83e4fcc7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
 	 * whether to freeze the page or not.  The all_visible and all_frozen
 	 * values returned to the caller are adjusted to include LP_DEAD items at
 	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -830,6 +826,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -1474,7 +1471,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1496,7 +1493,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1509,7 +1506,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1528,7 +1525,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1546,7 +1543,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
-- 
2.43.0



  [text/x-patch] v11-0007-Find-and-fix-VM-corruption-in-heap_page_prune_an.patch (12.1K, 8-v11-0007-Find-and-fix-VM-corruption-in-heap_page_prune_an.patch)
  download | inline diff:
From 34188dade43706d764392ef68af82c1e6deb663a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v11 07/20] Find and fix VM corruption in
 heap_page_prune_and_freeze

Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().

This commit is only really meant for review, as it adds a member to
PruneFreezeResult (vm_corruption) that is removed in later commits.
---
 src/backend/access/heap/pruneheap.c  | 93 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 84 +++----------------------
 src/include/access/heapam.h          |  4 ++
 3 files changed, 102 insertions(+), 79 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 373986b204a..54af3296b91 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
+
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+			heap_page_prune_and_freeze(relation, buffer, false,
+									   InvalidBuffer,
+									   vistest, 0,
 									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -294,6 +303,70 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +387,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
  * (see heap_prune_satisfies_vacuum).
  *
@@ -349,6 +426,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   bool blk_known_av,
+						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
 						   int options,
 						   struct VacuumCutoffs *cutoffs,
@@ -897,6 +976,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	/*
+	 * Clear any VM corruption. This does not need to be done in a critical
+	 * section.
+	 */
+	presult->vm_corruption = false;
+	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+																blockno, buffer, page,
+																blk_known_av,
+																prstate.lpdead_items, vmbuffer);
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1febb524d41..574e415b0e0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
-										   BlockNumber heap_blk,
-										   Buffer heap_buffer, Page heap_page,
-										   bool heap_blk_known_av,
-										   int64 nlpdead_items,
-										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1940,72 +1934,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
-							   BlockNumber heap_blk,
-							   Buffer heap_buffer, Page heap_page,
-							   bool heap_blk_known_av,
-							   int64 nlpdead_items,
-							   Buffer vmbuffer)
-{
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
-		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(relation), heap_blk)));
-
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(relation), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	return false;
-}
-
-
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2063,11 +1991,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+	heap_page_prune_and_freeze(rel, buf,
+							   all_visible_according_to_vm,
+							   vmbuffer,
+							   vacrel->vistest, prune_options,
 							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2152,10 +2083,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables. Start by looking for any VM corruption.
+	 * all_frozen variables.
 	 */
-	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
-									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	if (presult.vm_corruption)
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
+	bool		vm_corruption;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   bool blk_known_av,
+									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
 									   int options,
 									   struct VacuumCutoffs *cutoffs,
-- 
2.43.0



  [text/x-patch] v11-0006-Combine-vacuum-phase-I-VM-update-cases.patch (5.8K, 9-v11-0006-Combine-vacuum-phase-I-VM-update-cases.patch)
  download | inline diff:
From 8791f69380c4b30c85a590bf697440efa064ac7c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v11 06/20] Combine vacuum phase I VM update cases

We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.

Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.

The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
 src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
 1 file changed, 32 insertions(+), 69 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e9b4e924d22..1febb524d41 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2159,11 +2159,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
-	else if (!all_visible_according_to_vm && presult.all_visible)
+
+	/*
+	 * If the page isn't yet marked all-visible in the VM or it is and needs
+	 * to me marked all-frozen, update the VM. Note that all_frozen is only
+	 * valid if all_visible is true, so we must check both all_visible and
+	 * all_frozen.
+	 */
+	else if (presult.all_visible &&
+			 (!all_visible_according_to_vm ||
+			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
+		/*
+		 * If the page is all-frozen, we can pass InvalidTransactionId as our
+		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+		 * everything safe for REDO was logged when the page's tuples were
+		 * frozen.
+		 */
 		if (presult.all_frozen)
 		{
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2176,21 +2191,29 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * If the heap page is all-visible but the VM bit is not set, we don't
+		 * need to dirty the heap page.  However, if checksums are enabled, we
+		 * do need to make sure that the heap page is dirtied before passing
+		 * it to visibilitymap_set(), because it may be logged.
 		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
+		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
 									   flags);
 
 		/*
+		 * Even if we are only setting the all-frozen bit, there is a small
+		 * chance that the VM was modified sometime between setting
+		 * all_visible_according_to_vm and checking the visibility during
+		 * pruning. Check the return value of old_vmbits to ensure the
+		 * visibility map counters used for logging are accurate.
+		 *
 		 * If the page wasn't already set all-visible and/or all-frozen in the
 		 * VM, count it as newly set for logging.
 		 */
@@ -2211,66 +2234,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v11-0009-Update-VM-in-pruneheap.c.patch (12.5K, 10-v11-0009-Update-VM-in-pruneheap.c.patch)
  download | inline diff:
From 04abbbd76bded07b80a48fb7af9e30bc8cca93a2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v11 09/20] Update VM in pruneheap.c

As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
 src/backend/access/heap/pruneheap.c  | 99 +++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 98 +++++----------------------
 src/include/access/heapam.h          | 15 +++--
 3 files changed, 106 insertions(+), 106 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index bbd83e4fcc7..398962ed1cb 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -366,7 +366,8 @@ identify_and_fix_vm_corruption(Relation relation,
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -442,6 +443,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint;
+	uint8		vmflags = 0;
+	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -942,7 +945,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 *
 	 * Now that freezing has been finalized, unset all_visible if there are
 	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
+	 * of the page, as expected for updating the visibility map.
 	 */
 	if (prstate.all_visible && prstate.lpdead_items == 0)
 	{
@@ -958,31 +961,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->hastup = prstate.hastup;
 
 	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so the VM update record doesn't need it.
 	 */
 	if (presult->all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
 		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
 
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
 	/*
-	 * Clear any VM corruption. This does not need to be done in a critical
-	 * section.
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
 	 */
-	presult->vm_corruption = false;
 	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
-		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
-																blockno, buffer, page,
-																blk_known_av,
-																prstate.lpdead_items, vmbuffer);
+	{
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av,
+										   prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/*
+		 * If the page isn't yet marked all-visible in the VM or it is and
+		 * needs to me marked all-frozen, update the VM. Note that all_frozen
+		 * is only valid if all_visible is true, so we must check both
+		 * all_visible and all_frozen.
+		 */
+		else if (presult->all_visible &&
+				 (!blk_known_av ||
+				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			Assert(prstate.lpdead_items == 0);
+			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			/*
+			 * If the page is all-frozen, we can pass InvalidTransactionId as
+			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+			 * make everything safe for REDO was logged when the page's tuples
+			 * were frozen.
+			 */
+			if (presult->all_frozen)
+			{
+				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			}
+
+			/*
+			 * It's possible for the VM bit to be clear and the page-level bit
+			 * to be set if checksums are not enabled.
+			 *
+			 * And even if we are just planning to update the frozen bit in
+			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
+			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+			 * might have become stale.
+			 *
+			 * If the heap page is all-visible but the VM bit is not set, we
+			 * don't need to dirty the heap page.  However, if checksums are
+			 * enabled, we do need to make sure that the heap page is dirtied
+			 * before passing it to visibilitymap_set(), because it may be
+			 * logged.
+			 */
+			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+			{
+				PageSetAllVisible(page);
+				MarkBufferDirty(buffer);
+			}
+
+			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+										   vmbuffer, presult->vm_conflict_horizon,
+										   vmflags);
+		}
+	}
+
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
+
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = vmflags;
+
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 574e415b0e0..9492423141e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1949,7 +1949,8 @@ cmpOffsetNumbers(const void *a, const void *b)
  * vmbuffer is the buffer containing the VM block with visibility information
  * for the heap block, blkno. all_visible_according_to_vm is the saved
  * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1978,6 +1979,7 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
+	 * Then, if the page's visibility status has changed, update the VM.
 	 *
 	 * If the relation has no indexes, we can immediately mark would-be dead
 	 * items LP_UNUSED.
@@ -1986,10 +1988,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * presult.ndeleted.  It should not be confused with presult.lpdead_items;
 	 * presult.lpdead_items's final value can be thought of as the number of
 	 * tuples that were deleted from indexes.
-	 *
-	 * We will update the VM after collecting LP_DEAD items and freezing
-	 * tuples. Pruning will have determined whether or not the page is
-	 * all-visible.
 	 */
 	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
@@ -2081,88 +2079,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (presult.vm_corruption)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		/* Don't update the VM if we just cleared corruption in it */
-	}
-
-	/*
-	 * If the page isn't yet marked all-visible in the VM or it is and needs
-	 * to me marked all-frozen, update the VM. Note that all_frozen is only
-	 * valid if all_visible is true, so we must check both all_visible and
-	 * all_frozen.
-	 */
-	else if (presult.all_visible &&
-			 (!all_visible_according_to_vm ||
-			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		/*
-		 * If the page is all-frozen, we can pass InvalidTransactionId as our
-		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
-		 * everything safe for REDO was logged when the page's tuples were
-		 * frozen.
-		 */
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * If the heap page is all-visible but the VM bit is not set, we don't
-		 * need to dirty the heap page.  However, if checksums are enabled, we
-		 * do need to make sure that the heap page is dirtied before passing
-		 * it to visibilitymap_set(), because it may be logged.
-		 */
-		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * Even if we are only setting the all-frozen bit, there is a small
-		 * chance that the VM was modified sometime between setting
-		 * all_visible_according_to_vm and checking the visibility during
-		 * pruning. Check the return value of old_vmbits to ensure the
-		 * visibility map counters used for logging are accurate.
-		 *
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
+	}
 
 	return presult.ndeleted;
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
+	 * all_visible and all_frozen indicate the status of the page as reflected
+	 * in the visibility map after pruning, freezing, and setting any pages
+	 * all-visible in the visibility map.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * vm_conflict_horizon is the newest xmin of live tuples on the page
+	 * (older than OldestXmin).  It will only be valid if we did not set the
+	 * page all-frozen in the VM.
 	 *
 	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
-	bool		vm_corruption;
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
-- 
2.43.0



  [text/x-patch] v11-0010-Rename-PruneState.freeze-to-attempt_freeze.patch (3.7K, 11-v11-0010-Rename-PruneState.freeze-to-attempt_freeze.patch)
  download | inline diff:
From 133c61abb24a832033d973fd2509230a68cb9b9d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v11 10/20] Rename PruneState.freeze to attempt_freeze

This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.

Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
 src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 398962ed1cb..df3e6439176 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
 	/* whether or not dead items can be set LP_UNUSED during pruning */
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
-	bool		freeze;
+	bool		attempt_freeze;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -445,13 +445,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_hint;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
-	bool		hint_bit_fpi;
+	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
 	prstate.cutoffs = cutoffs;
 
 	/*
@@ -473,7 +473,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/* initialize page freezing working state */
 	prstate.pagefrz.freeze_required = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
 		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -520,7 +520,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * function, when we return the value to the caller, so that the caller
 	 * doesn't set the VM bit incorrectly.
 	 */
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -634,7 +634,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
 	 * an FPI to be emitted.
 	 */
-	hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	/*
 	 * Process HOT chains.
@@ -750,7 +750,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * plans we prepared, or not.
 	 */
 	do_freeze = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (prstate.pagefrz.freeze_required)
 		{
@@ -783,7 +783,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				 */
 				if (RelationNeedsWAL(relation))
 				{
-					if (hint_bit_fpi)
+					if (did_tuple_hint_fpi)
 						do_freeze = true;
 					else if (do_prune)
 					{
@@ -1046,7 +1046,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->old_vmbits = old_vmbits;
 	presult->new_vmbits = vmflags;
 
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
 		{
@@ -1628,7 +1628,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 	}
 
 	/* Consider freezing any normal tuples which will not be removed */
-	if (prstate->freeze)
+	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
 
-- 
2.43.0



  [text/x-patch] v11-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (7.1K, 12-v11-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From 74e196584204f9554c2425bc7be9ed9e1a9821fc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v11 13/20] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()

Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 14 +++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 13 ++++++-------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6637966e927..0211effeec7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -580,9 +580,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1182,11 +1182,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..f67f01c17c2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v11-0014-Use-GlobalVisState-to-determine-page-level-visib.patch (10.8K, 13-v11-0014-Use-GlobalVisState-to-determine-page-level-visib.patch)
  download | inline diff:
From 1e836a4bca61d6ab748a9d1c43dfcef6e0b06f81 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v11 14/20] Use GlobalVisState to determine page level
 visibility

During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.

It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.

Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.

Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
 src/backend/access/heap/pruneheap.c         | 48 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 19 ++++----
 src/include/access/heapam.h                 |  4 +-
 4 files changed, 60 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0211effeec7..c6935e45cec 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon, when setting the VM or when
+	 * freezing all the live tuples on the page.
 	 *
 	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
 	 * directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -559,14 +558,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon.
-	 * This is most likely to happen when updating the VM and/or freezing all
-	 * live tuples on the page. It is updated before returning to the caller
-	 * because vacuum does assert-build only validation on the page using this
-	 * field.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState.
+	 *
+	 * If we encounter an uncommitted tuple, this field is unmaintained. If
+	 * the page is being set all-visible or when freezing all live tuples on
+	 * the page, it is used to calculate the snapshot conflict horizon.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -762,6 +759,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -1108,12 +1115,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(cutoffs);
-
 		Assert(prstate.lpdead_items == 0);
 
 		if (!heap_page_is_all_visible(relation, buffer,
-									  cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1638,19 +1643,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2dcca071a45..4ad05ba4db6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2717,7 +2717,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 InvalidOffsetNumber);
 
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3462,13 +3462,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3487,7 +3487,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * *all_frozen is an output parameter indicating to the caller if every tuple
  * on the page is frozen.
@@ -3508,7 +3508,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3580,8 +3580,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		tuple.t_len = ItemIdGetLength(itemid);
 		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
-										 buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+												  buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3600,8 +3600,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin,
-											   OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
 
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v11-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patch (29.2K, 14-v11-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patch)
  download | inline diff:
From 029312b2d8a64782179df1bced1545bec1675211 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v11 11/20] Eliminate xl_heap_visible from vacuum phase I
 prune/freeze

Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.

This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
 src/backend/access/heap/pruneheap.c  | 459 ++++++++++++++++-----------
 src/backend/access/heap/vacuumlazy.c |  30 --
 src/include/access/heapam.h          |  15 +-
 3 files changed, 282 insertions(+), 222 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index df3e6439176..dce9025d268 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+
+	/*
+	 * Whether or not to consider updating the VM. There is some bookkeeping
+	 * that must be maintained if we would like to update the VM.
+	 */
+	bool		consider_update_vm;
+
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
 	 *
 	 * These fields are not used by pruning itself for the most part, but are
 	 * used to collect information about what was pruned and what state the
-	 * page is in after pruning, for the benefit of the caller.  They are
-	 * copied to the caller's PruneFreezeResult at the end.
+	 * page is in after pruning to use when updating the visibility map and
+	 * for the benefit of the caller.  They are copied to the caller's
+	 * PruneFreezeResult at the end.
 	 * -------------------------------------------------------
 	 */
 
@@ -138,11 +146,10 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+	 * directly before updating the VM. We ignore LP_DEAD items when deciding
+	 * whether or not to opportunistically freeze and when determining the
+	 * snapshot conflict horizon required when freezing tuples.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -377,12 +384,15 @@ identify_and_fix_vm_corruption(Relation relation,
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
  *
  * blk_known_av is the visibility status of the heap block as of the last call
  * to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -398,6 +408,8 @@ identify_and_fix_vm_corruption(Relation relation,
  *   FREEZE indicates that we will also freeze tuples, and will return
  *   'all_visible', 'all_frozen' flags to the caller.
  *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -442,18 +454,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	HeapTupleData tup;
 	bool		do_freeze;
 	bool		do_prune;
-	bool		do_hint;
+	bool		do_hint_full_or_prunable;
+	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	bool		all_frozen_except_lp_dead = false;
+	bool		set_pd_all_visible = false;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
+	Assert(!prstate.consider_update_vm || vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -498,50 +516,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Keep track of whether or not the page will be all-visible and
+	 * all-frozen for use in opportunistic freezing and to update the VM if
+	 * the caller requests it.
+	 *
+	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
+	 * other callers could do either one. The visibility bookkeeping is
+	 * required for opportunistic freezing (in addition to setting the VM
+	 * bits) because we only consider opportunistically freezing tuples if the
+	 * whole page would become all-frozen or if the whole page will be frozen
+	 * except for dead tuples that will be removed by vacuum.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * If only updating the VM, we must initialize all_frozen to false, as
+	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
+	 * page and we will not end up correctly setting it to false later.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * Dead tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing, so we do not clear
+	 * all_visible when we see LP_DEAD items. We fix that after determining
+	 * whether or not to freeze but before deciding whether or not to update
+	 * the VM so that we don't set the VM bit incorrectly.
+	 *
+	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * bookkeeping. Initializing all_visible to false allows skipping the work
+	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
 	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if (prstate.consider_update_vm)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon.
+	 * This is most likely to happen when updating the VM and/or freezing all
+	 * live tuples on the page. It is updated before returning to the caller
+	 * because vacuum does assert-build only validation on the page using this
+	 * field.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -739,10 +764,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
+	 * pd_prune_xid field or the page was marked full, we will update those
+	 * hint bits.
 	 */
-	do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_full_or_prunable =
+		((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
 	/*
@@ -790,7 +816,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 						if (XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
 					}
-					else if (do_hint)
+					else if (do_hint_full_or_prunable)
 					{
 						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
@@ -829,11 +855,88 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	all_frozen_except_lp_dead = prstate.all_frozen;
+	if (prstate.lpdead_items > 0)
+	{
+		prstate.all_visible = false;
+		prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
+	 */
+	if (prstate.consider_update_vm)
+	{
+		/*
+		 * Clear any VM corruption. This does not need to be in a critical
+		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+		 * we may mark the heap page buffer dirty here and could end up doing
+		 * so again later. This is not a correctness issue and is in the path
+		 * of VM corruption, so we don't have to worry about the extra
+		 * performance overhead.
+		 */
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av, prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/* Determine if we actually need to set the VM and which bits to set. */
+		else if (prstate.all_visible &&
+				 (!blk_known_av ||
+				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+			if (prstate.all_frozen)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+	}
+
+	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+	/* Save these for the caller in case we later zero out vmflags */
+	presult->new_vmbits = vmflags;
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
-	if (do_hint)
+	if (do_hint_full_or_prunable)
 	{
 		/*
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
@@ -849,15 +952,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageClearFull(page);
 
 		/*
-		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+		 * this is a non-WAL-logged hint.  If we are going to freeze or prune
+		 * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+		 * dirty and emit WAL below.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_prune && !do_freeze && !set_pd_all_visible)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -871,12 +975,47 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (set_pd_all_visible)
+			PageSetAllVisible(page);
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+		 * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+		 * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+		 * appropriate here.
 		 */
-		if (RelationNeedsWAL(relation))
+		if (do_prune || do_freeze || set_pd_all_visible)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			Assert(PageIsAllVisible(page));
+			old_vmbits = visibilitymap_set_vmbits(relation, blockno,
+												  vmbuffer, vmflags);
+
+			if (old_vmbits == vmflags)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				do_set_vm = false;
+				/* 0 out vmflags so we don't emit WAL to update the VM */
+				vmflags = 0;
+			}
+		}
+
+		/*
+		 * It should never be the case that PD_ALL_VISIBLE is not set and the
+		 * VM is set. Or, if it were, we should have caught it earlier when
+		 * finding and fixing VM corruption. So, if we found out the VM was
+		 * already set above, we should have found PD_ALL_VISIBLE set earlier.
+		 */
+		Assert(!set_pd_all_visible || do_set_vm);
+
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+		 * we were only updating the VM and it turns out it was already set,
+		 * we will have unset do_set_vm earlier. As such, check it again
+		 * before emitting the record.
+		 */
+		if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -888,35 +1027,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
-			TransactionId conflict_xid;
+			TransactionId conflict_xid = InvalidTransactionId;
+
+			/*
+			 * If we are updating the VM, the conflict horizon is almost
+			 * always the visibility cutoff XID.
+			 *
+			 * Separately, if we are freezing any tuples, as an optimization,
+			 * we can use the visibility_cutoff_xid as the conflict horizon if
+			 * the page will be all-frozen. This is true even if there are
+			 * LP_DEAD line pointers because we ignored those when maintaining
+			 * the visibility_cutoff_xid.
+			 */
+			if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+				conflict_xid = prstate.visibility_cutoff_xid;
 
 			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
+			 * Otherwise, if we are freezing but the page would not be
+			 * all-frozen, we have to use the more pessimistic horizon of
+			 * OldestXmin, which may be newer than the newest tuple we froze.
+			 * We currently don't track the newest tuple we froze.
 			 */
-			if (do_freeze)
+			else if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
+				conflict_xid = prstate.cutoffs->OldestXmin;
+				TransactionIdRetreat(conflict_xid);
 			}
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
+			/*
+			 * If we are removing tuples with a younger xmax than our so far
+			 * calculated conflict_xid, we must use this as our horizon.
+			 */
+			if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
 				conflict_xid = prstate.latest_xid_removed;
 
+			/*
+			 * We can omit the snapshot conflict horizon if we are not pruning
+			 * or freezing any tuples and are setting an already all-visible
+			 * page all-frozen in the VM. In this case, all of the tuples on
+			 * the page must already be visible to all MVCC snapshots on the
+			 * standby.
+			 */
+			if (!do_prune && !do_freeze && do_set_vm &&
+				blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+				conflict_xid = InvalidTransactionId;
+
 			log_heap_prune_and_freeze(relation, buffer,
 									  false,
-									  InvalidBuffer, 0, false,
+									  vmbuffer,
+									  vmflags,
+									  set_pd_all_visible,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -928,124 +1088,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected for updating the visibility map.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
-	presult->hastup = prstate.hastup;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so the VM update record doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * VACUUM will call heap_page_is_all_visible() during the second pass over
+	 * the heap to determine all_visible and all_frozen for the page -- this
+	 * is a specialized version of the logic from this function.  Now that
+	 * we've finished pruning and freezing, make sure that we're in total
+	 * agreement with heap_page_is_all_visible() using an assertion. We will
+	 * have already set the page in the VM, so this assertion will only let
+	 * you know that you've already done something wrong.
 	 */
-	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
 	{
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av,
-										   prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		/*
-		 * If the page isn't yet marked all-visible in the VM or it is and
-		 * needs to me marked all-frozen, update the VM. Note that all_frozen
-		 * is only valid if all_visible is true, so we must check both
-		 * all_visible and all_frozen.
-		 */
-		else if (presult->all_visible &&
-				 (!blk_known_av ||
-				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			Assert(prstate.lpdead_items == 0);
-			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		Assert(cutoffs);
 
-			/*
-			 * If the page is all-frozen, we can pass InvalidTransactionId as
-			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
-			 * make everything safe for REDO was logged when the page's tuples
-			 * were frozen.
-			 */
-			if (presult->all_frozen)
-			{
-				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			}
+		Assert(prstate.lpdead_items == 0);
 
-			/*
-			 * It's possible for the VM bit to be clear and the page-level bit
-			 * to be set if checksums are not enabled.
-			 *
-			 * And even if we are just planning to update the frozen bit in
-			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
-			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
-			 * might have become stale.
-			 *
-			 * If the heap page is all-visible but the VM bit is not set, we
-			 * don't need to dirty the heap page.  However, if checksums are
-			 * enabled, we do need to make sure that the heap page is dirtied
-			 * before passing it to visibilitymap_set(), because it may be
-			 * logged.
-			 */
-			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buffer);
-			}
+		if (!heap_page_is_all_visible(relation, buffer,
+									  cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
 
-			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
-										   vmbuffer, presult->vm_conflict_horizon,
-										   vmflags);
-		}
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
+#endif
 
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->old_vmbits = old_vmbits;
+	/* new_vmbits was set above */
+	presult->hastup = prstate.hastup;
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = vmflags;
-
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
@@ -1627,7 +1718,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			break;
 	}
 
-	/* Consider freezing any normal tuples which will not be removed */
+	/*
+	 * Consider freezing any normal tuples which will not be removed.
+	 * Regardless of whether or not we want to freeze the tuples, if we want
+	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+	 * tuple to know whether or not the page will be totally frozen.
+	 */
 	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
@@ -2190,7 +2286,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phase III, the heap page may be marked
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
  *   all-visible and all-frozen.
  *
  * These changes all happen together, so we use a singel WAL record for them
@@ -2244,6 +2340,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
 	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
+	Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
 	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
 	xlrec.flags = vmflags;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9492423141e..75205179b83 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2015,34 +2015,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2076,8 +2048,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate the status of the page as reflected
-	 * in the visibility map after pruning, freezing, and setting any pages
-	 * all-visible in the visibility map.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page
-	 * (older than OldestXmin).  It will only be valid if we did not set the
-	 * page all-frozen in the VM.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
 	uint8		old_vmbits;
 	uint8		new_vmbits;
 
-- 
2.43.0



  [text/x-patch] v11-0012-Remove-xl_heap_visible-entirely.patch (24.0K, 15-v11-0012-Remove-xl_heap_visible-entirely.patch)
  download | inline diff:
From 061b3cfcc586895787a1d682156f73dc6a9705a4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v11 12/20] Remove xl_heap_visible entirely

There are now no users of this, so eliminate it entirely.
---
 src/backend/access/common/bufmask.c      |   3 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 152 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  10 +-
 src/backend/access/heap/visibilitymap.c  | 109 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  11 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 32 insertions(+), 364 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cff531a4801..6f161a6eab2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
 #include "access/valid.h"
 #include "access/visibilitymap.h"
 #include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
@@ -2526,11 +2527,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(relation,
-									 BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set(relation,
+							  BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN);
 		}
 
 		/*
@@ -8801,49 +8802,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
 
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11c11929ed9..ff3ad8b4cd2 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -53,6 +53,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
 	vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
 
 	/*
 	 * After xl_heap_prune is the optional snapshot conflict horizon.
@@ -250,7 +252,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+		old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
 
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
@@ -269,142 +271,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -785,15 +651,14 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(reln, blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN);
-
 		/*
 		 * It is not possible that the VM was already set for this heap page,
 		 * so the vmbuffer must have been modified and marked dirty.
 		 */
+		visibilitymap_set(reln, blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		FreeFakeRelcacheEntry(reln);
@@ -1374,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dce9025d268..6637966e927 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -989,8 +989,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_set_vm)
 		{
 			Assert(PageIsAllVisible(page));
-			old_vmbits = visibilitymap_set_vmbits(relation, blockno,
-												  vmbuffer, vmflags);
+			old_vmbits = visibilitymap_set(relation, blockno,
+										   vmbuffer, vmflags);
 
 			if (old_vmbits == vmflags)
 			{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 75205179b83..2dcca071a45 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1888,8 +1888,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			visibilitymap_set_vmbits(vacrel->rel, blkno,
-									 vmbuffer, new_vmbits);
+			visibilitymap_set(vacrel->rel, blkno,
+							  vmbuffer, new_vmbits);
 
 			if (RelationNeedsWAL(vacrel->rel))
 			{
@@ -2757,9 +2757,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		Assert(!PageIsAllVisible(page));
 		set_pd_all_vis = true;
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(vacrel->rel,
-								 blkno,
-								 vmbuffer, vmflags);
+		visibilitymap_set(vacrel->rel,
+						  blkno,
+						  vmbuffer, vmflags);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index aa48a436108..f7bad68ffc5 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set flags in the VM block contained in the passed in vmBuf.
  *
@@ -343,8 +240,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * is pinned and exclusive locked.
  */
 uint8
-visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 439f33b8061..3342af02c75 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -344,13 +344,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -455,9 +448,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d8508593e7c..3672f372aa8 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -446,20 +445,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -503,11 +488,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index fc7056a91ea..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..b4c880c083f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4273,7 +4273,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v11-0015-Inline-TransactionIdFollows-Precedes.patch (5.0K, 16-v11-0015-Inline-TransactionIdFollows-Precedes.patch)
  download | inline diff:
From 67cd0164f81fc9612875edd024b917ed79707b83 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v11 15/20] Inline TransactionIdFollows/Precedes()

Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/transam/transam.c | 64 -------------------------
 src/include/access/transam.h         | 70 ++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
-	/*
-	 * If either ID is a permanent XID then we can just do unsigned
-	 * comparison.  If both are normal, do a modulo-2^32 comparison.
-	 */
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 < id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 <= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 > id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 >= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff >= 0);
-}
-
 
 /*
  * TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
 } TransamVariablesData;
 
 
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+	/*
+	 * If either ID is a permanent XID then we can just do unsigned
+	 * comparison.  If both are normal, do a modulo-2^32 comparison.
+	 */
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 < id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 <= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 > id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 >= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff >= 0);
+}
+
+
 /* ----------------
  *		extern declarations
  * ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
 extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
-- 
2.43.0



  [text/x-patch] v11-0016-Unset-all-visible-sooner-if-not-freezing.patch (2.5K, 17-v11-0016-Unset-all-visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 6a4f009579e3067371d50bb85080243a26fd333f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v11 16/20] Unset all-visible sooner if not freezing

In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.

Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.

Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
 src/backend/access/heap/pruneheap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c6935e45cec..ba8ddc7fa35 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1503,8 +1503,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1762,8 +1765,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v11-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch (27.1K, 18-v11-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 903038c95c425ffaf35925483ae3f3a4c010f5a9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v11 17/20] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c              | 15 ++++-
 src/backend/access/heap/heapam_handler.c      | 15 ++++-
 src/backend/access/heap/pruneheap.c           | 67 ++++++++++++++-----
 src/backend/access/index/indexam.c            | 46 +++++++++++++
 src/backend/access/table/tableam.c            | 39 +++++++++--
 src/backend/executor/execMain.c               |  4 ++
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  7 +-
 src/backend/executor/nodeIndexscan.c          | 18 +++--
 src/backend/executor/nodeSeqscan.c            | 24 +++++--
 src/include/access/genam.h                    | 11 +++
 src/include/access/heapam.h                   | 24 ++++++-
 src/include/access/relscan.h                  |  6 ++
 src/include/access/tableam.h                  | 30 ++++++++-
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  3 +-
 16 files changed, 277 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6f161a6eab2..f9e50d47aee 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1247,6 +1250,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1285,6 +1289,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1317,6 +1327,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ba8ddc7fa35..69d8e42bdc8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			int			options = 0;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				options = HEAP_PAGE_PRUNE_UPDATE_VM;
+			}
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * that during on-access pruning with the current implementation.
 			 */
 			heap_page_prune_and_freeze(relation, buffer, false,
-									   InvalidBuffer,
-									   vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+									   vmbuffer ? *vmbuffer : InvalidBuffer,
+									   vistest, options,
+									   NULL, &presult, PRUNE_ON_ACCESS,
+									   &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -519,12 +531,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * all-frozen for use in opportunistic freezing and to update the VM if
 	 * the caller requests it.
 	 *
-	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
-	 * other callers could do either one. The visibility bookkeeping is
-	 * required for opportunistic freezing (in addition to setting the VM
-	 * bits) because we only consider opportunistically freezing tuples if the
-	 * whole page would become all-frozen or if the whole page will be frozen
-	 * except for dead tuples that will be removed by vacuum.
+	 * Currently, only VACUUM attempts freezing. But other callers could. The
+	 * visibility bookkeeping is required for opportunistic freezing (in
+	 * addition to setting the VM bits) because we only consider
+	 * opportunistically freezing tuples if the whole page would become
+	 * all-frozen or if the whole page will be frozen except for dead tuples
+	 * that will be removed by vacuum. But if consider_update_vm is false,
+	 * we'll not set the VM even if the page is discovered to be all-visible.
+	 *
+	 * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+	 * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+	 * because we will not call heap_prepare_freeze_tuple() on each tuple.
 	 *
 	 * If only updating the VM, we must initialize all_frozen to false, as
 	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -536,7 +553,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * whether or not to freeze but before deciding whether or not to update
 	 * the VM so that we don't set the VM bit incorrectly.
 	 *
-	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * If not freezing and not updating the VM, we avoid the extra
 	 * bookkeeping. Initializing all_visible to false allows skipping the work
 	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
@@ -885,12 +902,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.all_frozen = false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate.consider_update_vm &&
+		prstate.all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate.consider_update_vm = false;
+		prstate.all_visible = prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * Handle setting visibility map bit based on information from the VM (if
+	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+	 * call), and from all_visible and all_frozen variables.
 	 */
 	if (prstate.consider_update_vm)
 	{
@@ -2284,8 +2319,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a singel WAL record for them
  * all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff12e2e1364..2e0474c948a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index b2ce35e2a34..e31c21cf8eb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -881,6 +883,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -918,10 +939,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, struct ScanKeyData *key)
+				   int nkeys, struct ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index de782014b2d..839c1be1d7c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -678,6 +678,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v11-0020-Set-pd_prune_xid-on-insert.patch (6.5K, 19-v11-0020-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 1f1a222f13f55c0c8e4c66fe5075b0bd3f7f1949 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v11 20/20] Set pd_prune_xid on insert

Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.

For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.

Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f9e50d47aee..09d97896c66 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2164,15 +2165,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2182,7 +2187,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2548,8 +2552,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index ff3ad8b4cd2..e7d7804871b 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -470,6 +470,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -619,9 +625,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



  [text/x-patch] v11-0018-Add-helper-functions-to-heap_page_prune_and_free.patch (19.2K, 20-v11-0018-Add-helper-functions-to-heap_page_prune_and_free.patch)
  download | inline diff:
From fbf27c16f0add32135836ea843cdbc1b8fc4aa44 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v11 18/20] Add helper functions to heap_page_prune_and_freeze

heap_page_prune_and_freeze() has gotten rather long. It has several
stages:

1) setup - where the PruneState is set up
2) tuple examination -- where tuples and line pointers are examined to
   determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
   heuristics, and state gathered during stage 2 whether or not to
   freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged

This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.

XXX: For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
 src/backend/access/heap/pruneheap.c | 473 +++++++++++++++++-----------
 1 file changed, 296 insertions(+), 177 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 69d8e42bdc8..67b56e45ad7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+								  bool do_prune,
+								  bool do_hint_full_or_prunable,
+								  bool did_tuple_hint_fpi,
+								  PruneState *prstate,
+								  bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+									 Buffer buffer, BlockNumber blockno, Page page,
+									 PruneReason reason,
+									 bool do_prune, bool do_freeze,
+									 bool blk_known_av,
+									 PruneState *prstate,
+									 Buffer *vmbuffer, uint8 *vmflags,
+									 bool *set_pd_all_visible);
+
 static bool identify_and_fix_vm_corruption(Relation relation,
 										   BlockNumber heap_blk,
 										   Buffer heap_buffer, Page heap_page,
@@ -382,6 +398,249 @@ identify_and_fix_vm_corruption(Relation relation,
 	return false;
 }
 
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+						 Buffer buffer, BlockNumber blockno, Page page,
+						 PruneReason reason,
+						 bool do_prune, bool do_freeze,
+						 bool blk_known_av,
+						 PruneState *prstate,
+						 Buffer *vmbuffer, uint8 *vmflags,
+						 bool *set_pd_all_visible)
+{
+	bool		do_set_vm = false;
+
+	/*
+	 * If the caller specified not to update the VM, validate everything is in
+	 * the right state and exit.
+	 */
+	if (!prstate->consider_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		/* We don't set only the page level visibility hint */
+		Assert(!(*set_pd_all_visible));
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->consider_update_vm &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate->consider_update_vm = false;
+		prstate->all_visible = prstate->all_frozen = false;
+	}
+
+	Assert(!prstate->all_frozen || prstate->all_visible);
+
+	/*
+	 * Clear any VM corruption. This does not need to be in a critical
+	 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+	 * may mark the heap page buffer dirty here and could end up doing so
+	 * again later. This is not a correctness issue and is in the path of VM
+	 * corruption, so we don't have to worry about the extra performance
+	 * overhead.
+	 */
+	if (identify_and_fix_vm_corruption(relation,
+									   blockno, buffer, page,
+									   blk_known_av, prstate->lpdead_items,
+									   *vmbuffer))
+	{
+		/* If we fix corruption, don't update the VM further */
+	}
+
+	/* Determine if we actually need to set the VM and which bits to set. */
+	else if (prstate->all_visible &&
+			 (!blk_known_av ||
+			  (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+	{
+		*vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+	}
+
+	do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	*set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+	return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool do_prune,
+					  bool do_hint_full_or_prunable,
+					  bool did_tuple_hint_fpi,
+					  PruneState *prstate,
+					  bool *all_frozen_except_lp_dead)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and exit.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		Assert(!(*all_frozen_except_lp_dead));
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_full_or_prunable)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	*all_frozen_except_lp_dead = prstate->all_frozen;
+	if (prstate->lpdead_items > 0)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
+
+	return do_freeze;
+}
+
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page. If the page's visibility status has changed, update it in
@@ -772,20 +1031,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Clear the offset information once we have processed the given page. */
 	*off_loc = InvalidOffsetNumber;
 
-	do_prune = prstate.nredirected > 0 ||
-		prstate.ndead > 0 ||
-		prstate.nunused > 0;
-
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
-	 * all-visible.
+	 * all-visible. This must be done before we decide whether or not to
+	 * opportunistically freeze below because we do not want to
+	 * opportunistically freeze the page if there are live tuples not visible
+	 * to everyone, which would prevent setting the page frozen in the VM.
 	 */
 	if (prstate.all_visible &&
 		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
+	/*
+	 * Now decide based on information collected while examining every tuple
+	 * which actions to take. If there are any prunable tuples, we'll prune
+	 * them. However, we will decide based on options specified by the caller
+	 * and various heuristics whether or not to freeze any tuples and whether
+	 * or not the page should be set all-visible/all-frozen in the VM.
+	 */
+	do_prune = prstate.nredirected > 0 ||
+		prstate.ndead > 0 ||
+		prstate.nunused > 0;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -796,186 +1065,36 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageIsFull(page);
 
 	/*
-	 * Decide if we want to go ahead with freezing according to the freeze
-	 * plans we prepared, or not.
-	 */
-	do_freeze = false;
-	if (prstate.attempt_freeze)
-	{
-		if (prstate.pagefrz.freeze_required)
-		{
-			/*
-			 * heap_prepare_freeze_tuple indicated that at least one XID/MXID
-			 * from before FreezeLimit/MultiXactCutoff is present.  Must
-			 * freeze to advance relfrozenxid/relminmxid.
-			 */
-			do_freeze = true;
-		}
-		else
-		{
-			/*
-			 * Opportunistically freeze the page if we are generating an FPI
-			 * anyway and if doing so means that we can set the page
-			 * all-frozen afterwards (might not happen until VACUUM's final
-			 * heap pass).
-			 *
-			 * XXX: Previously, we knew if pruning emitted an FPI by checking
-			 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze
-			 * and prune records were combined, this heuristic couldn't be
-			 * used anymore.  The opportunistic freeze heuristic must be
-			 * improved; however, for now, try to approximate the old logic.
-			 */
-			if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
-			{
-				/*
-				 * Freezing would make the page all-frozen.  Have already
-				 * emitted an FPI or will do so anyway?
-				 */
-				if (RelationNeedsWAL(relation))
-				{
-					if (did_tuple_hint_fpi)
-						do_freeze = true;
-					else if (do_prune)
-					{
-						if (XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-					else if (do_hint_full_or_prunable)
-					{
-						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
-	}
-	else if (prstate.nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate.pagefrz.freeze_required);
-
-		prstate.all_frozen = false;
-		prstate.nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page. It needs to reflect the present state of
-	 * the page when using it to determine whether or not to update the VM.
-	 *
-	 * Keep track of whether or not the page was all-frozen except LP_DEAD
-	 * items for the purposes of calculating the snapshot conflict horizon,
-	 * though.
+	 * We must decide whether or not to freeze before deciding if and what to
+	 * set in the VM.
 	 */
-	all_frozen_except_lp_dead = prstate.all_frozen;
-	if (prstate.lpdead_items > 0)
-	{
-		prstate.all_visible = false;
-		prstate.all_frozen = false;
-	}
+	do_freeze = heap_page_will_freeze(relation, buffer,
+									  do_prune,
+									  do_hint_full_or_prunable,
+									  did_tuple_hint_fpi,
+									  &prstate,
+									  &all_frozen_except_lp_dead);
+
+	do_set_vm = heap_page_will_update_vm(relation,
+										 buffer, blockno, page,
+										 reason,
+										 do_prune, do_freeze,
+										 blk_known_av,
+										 &prstate,
+										 &vmbuffer,
+										 &vmflags, &set_pd_all_visible);
 
-	/*
-	 * If this is an on-access call and we're not actually pruning, avoid
-	 * setting the visibility map if it would newly dirty the heap page or, if
-	 * the page is already dirty, if doing so would require including a
-	 * full-page image (FPI) of the heap page in the WAL. This situation
-	 * should be rare, as on-access pruning is only attempted when
-	 * pd_prune_xid is valid.
-	 */
-	if (reason == PRUNE_ON_ACCESS &&
-		prstate.consider_update_vm &&
-		prstate.all_visible &&
-		!do_prune && !do_freeze &&
-		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
-	{
-		prstate.consider_update_vm = false;
-		prstate.all_visible = prstate.all_frozen = false;
-	}
-
-	Assert(!prstate.all_frozen || prstate.all_visible);
-
-	/*
-	 * Handle setting visibility map bit based on information from the VM (if
-	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
-	 * call), and from all_visible and all_frozen variables.
-	 */
-	if (prstate.consider_update_vm)
-	{
-		/*
-		 * Clear any VM corruption. This does not need to be in a critical
-		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
-		 * we may mark the heap page buffer dirty here and could end up doing
-		 * so again later. This is not a correctness issue and is in the path
-		 * of VM corruption, so we don't have to worry about the extra
-		 * performance overhead.
-		 */
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av, prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
-
-		/* Determine if we actually need to set the VM and which bits to set. */
-		else if (prstate.all_visible &&
-				 (!blk_known_av ||
-				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
-			if (prstate.all_frozen)
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-	}
-
-	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+	/* Save these for the caller in case we later zero out vmflags */
+	presult->new_vmbits = vmflags;
 
-	/* Lock vmbuffer before entering a critical section */
+	/* Lock vmbuffer before entering critical section */
 	if (do_set_vm)
 		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/*
-	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
-	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
-	 * set, we strongly prefer to keep them in sync.
-	 *
-	 * Prior to Postgres 19, it was possible for the page-level bit to be set
-	 * and the VM bit to be clear. This could happen if we crashed after
-	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 * Time to actually make the changes to the page and log them. Any error
+	 * while applying the changes is critical.
 	 */
-	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
-
-	/* Save these for the caller in case we later zero out vmflags */
-	presult->new_vmbits = vmflags;
-
-	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
 	if (do_hint_full_or_prunable)
-- 
2.43.0



  [text/x-patch] v11-0019-Reorder-heap_page_prune_and_freeze-parameters.patch (5.8K, 21-v11-0019-Reorder-heap_page_prune_and_freeze-parameters.patch)
  download | inline diff:
From e75e86aac65557f119d5d00077cf21183b55ce46 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v11 19/20] Reorder heap_page_prune_and_freeze parameters

Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
 src/backend/access/heap/pruneheap.c  | 38 ++++++++++++++--------------
 src/backend/access/heap/vacuumlazy.c |  6 ++---
 src/include/access/heapam.h          |  4 +--
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 67b56e45ad7..3e55c43f17b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, false,
+			heap_page_prune_and_freeze(relation, buffer, options, false,
 									   vmbuffer ? *vmbuffer : InvalidBuffer,
-									   vistest, options,
-									   NULL, &presult, PRUNE_ON_ACCESS,
+									   vistest,
+									   NULL, PRUNE_ON_ACCESS, &presult,
 									   &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -651,6 +651,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
+ * options:
+ *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ *   pruning.
+ *
+ *   FREEZE indicates that we will also freeze tuples, and will return
+ *   'all_visible', 'all_frozen' flags to the caller.
+ *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
@@ -669,30 +678,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * contain the required block of the visibility map.
  *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- *   UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
  *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed.  It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -705,13 +705,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   int options,
 						   bool blk_known_av,
 						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
-						   int options,
 						   struct VacuumCutoffs *cutoffs,
-						   PruneFreezeResult *presult,
 						   PruneReason reason,
+						   PruneFreezeResult *presult,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4ad05ba4db6..4fb915e1d94 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1993,11 +1993,11 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf,
+	heap_page_prune_and_freeze(rel, buf, prune_options,
 							   all_visible_according_to_vm,
 							   vmbuffer,
-							   vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+							   vacrel->vistest,
+							   &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
 								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   int options,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
-									   int options,
 									   struct VacuumCutoffs *cutoffs,
-									   PruneFreezeResult *presult,
 									   PruneReason reason,
+									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-09 14:00  Robert Haas <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Robert Haas @ 2025-09-09 14:00 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Sep 8, 2025 at 6:29 PM Melanie Plageman
<[email protected]> wrote:
> But, I think you're right that maintaining the order of operations
> proposed in transam/README is more important. As such, in attached
> v11, I've modified this patch and the other patches where I replace
> visibilitymap_set() with visibilitymap_set_vmbits() to exclusively
> lock the vmbuffer before the critical section.
> visibilitymap_set_vmbits() asserts that we have the vmbuffer
> exclusively locked, so we should be good.

That sounds good. I think it is OK to keep some of the odd things that
we're currently doing if they're hard to eliminate, but if they're not
really needed then I'd rather see us standardize the code. I feel (and
I think you may agree, based on other conversations that we've had)
that the visibility map code is somewhat oddly structured, and I'd
like to see us push the amount of oddness down rather than up, if we
can reasonably do so without breaking everything.

> The only difference is I replaced the phrase "LSN interlock" with
> "being dropped or truncated later in recovery" -- which is more
> specific and, I thought, more clear. Without this comment, it took me
> some time to understand the scenarios that might lead us to skip
> updating the heap block. heap_xlog_visible() has cause to describe
> this situation in an earlier comment -- which is why I think the LSN
> interlock comment is less confusing there.
>
> Anyway, I'm open to changing the comment. I could:
> 1) copy-paste the same comment as heap_xlog_visible()
> 2) refer to the comment in heap_xlog_visible() (comment seemed a bit
> short for that)
> 3) diverge the comments further by improving the new comment in
> heap_xlog_multi_insert() in some way
> 4) something else?

IMHO, copying and pasting comments is not great, and comments with
identical intent and divergent wording are also not great. The former
is not great because having a whole bunch of copies of the same
comment, especially if it's a block comment rather than a 1-liner,
uses up a bunch of space and creates a maintenance hazard in the sense
that future updates might not get propagated to all copies. The latter
is not great because it makes it hard to grep for other instances that
should be adjusted when you adjust one, and also because if one
version really is better than the other than ideally we'd like to have
the good version everywhere. Of course, there's some tension between
these two goals. In this particular case, thinking a little harder
about your proposed change, it seems to me that "LSN interlock" is
more clear about what the immediate test is that would cause us to
skip updating the heap page, and "being dropped or truncated later in
recovery" is more clear about what the larger state of the world that
would lead to that situation is. But whatever preference anyone might
have about which way to go with that choice, it is hard to see why the
preference should go one way in one case and the other way in another
case. Therefore, I favor an approach that leads either to an identical
comment in both places, or to one comment referring to the other.

> > The second paragraph does not convince me at all. I see no reason to
> > believe that this is safe, or that it is a good idea. The code in
> > xlog_heap_visible() thinks its OK to unlock and relock the page to
> > make visibilitymap_set() happy, which is cringy but probably safe for
> > lack of concurrent writers, but skipping locking altogether seems
> > deeply unwise.
>
> Actually in master, heap_xlog_visible() has no lock on the heap page
> when it calls visibiltymap_set(). It releases that lock before
> recording the freespace in the FSM and doesn't take it again.
>
> It does unlock and relock the VM page -- because visibilitymap_set()
> expects to take the lock on the VM.
>
> I agree that not holding the heap lock while updating the VM is
> unsatisfying. We can't hold it while doing the IO to read in the VM
> block in XLogReadBufferForRedoExtended(). So, we could take it again
> before calling visibilitymap_set(). But we don't always have the heap
> buffer, though. I suspect this is partially why heap_xlog_visible()
> unconditionally passes InvalidBuffer to visibilitymap_set() as the
> heap buffer and has special case handling for recovery when we don't
> have the heap buffer.

You know, I wasn't thinking carefully enough about the distinction
between the heap page and the visibility map page here. I thought you
were saying that you were modifying a page without a lock on that
page, but you aren't: you're saying you're modifying a page without a
lock on another page to which it is related. The former seems
disastrous, but the latter might be OK. However, I'm sort of confused
about what the comment is trying to say to justify that:

+        * It is only okay to set the VM bits without holding the heap page lock
+        * because we can expect no other writers of this page.

It is not exactly clear to me whether "this page" here refers to the
heap page or the VM page. If it means the heap page, why should that
be so if we haven't got any kind of lock? If it means the VM page,
then why is the heap page even relevant?

-- 
Robert Haas
EDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-09 16:24  Melanie Plageman <[email protected]>
  parent: Robert Haas <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-09-09 16:24 UTC (permalink / raw)
  To: Robert Haas <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Tue, Sep 9, 2025 at 10:00 AM Robert Haas <[email protected]> wrote:
>
> On Mon, Sep 8, 2025 at 6:29 PM Melanie Plageman
> <[email protected]> wrote:
>
> > The only difference is I replaced the phrase "LSN interlock" with
> > "being dropped or truncated later in recovery" -- which is more
> > specific and, I thought, more clear. Without this comment, it took me
> > some time to understand the scenarios that might lead us to skip
> > updating the heap block. heap_xlog_visible() has cause to describe
> > this situation in an earlier comment -- which is why I think the LSN
> > interlock comment is less confusing there.
> >
> > Anyway, I'm open to changing the comment. I could:
> > 1) copy-paste the same comment as heap_xlog_visible()
> > 2) refer to the comment in heap_xlog_visible() (comment seemed a bit
> > short for that)
> > 3) diverge the comments further by improving the new comment in
> > heap_xlog_multi_insert() in some way
> > 4) something else?
>
> IMHO, copying and pasting comments is not great, and comments with
> identical intent and divergent wording are also not great. The former
> is not great because having a whole bunch of copies of the same
> comment, especially if it's a block comment rather than a 1-liner,
> uses up a bunch of space and creates a maintenance hazard in the sense
> that future updates might not get propagated to all copies. The latter
> is not great because it makes it hard to grep for other instances that
> should be adjusted when you adjust one, and also because if one
> version really is better than the other than ideally we'd like to have
> the good version everywhere. Of course, there's some tension between
> these two goals. In this particular case, thinking a little harder
> about your proposed change, it seems to me that "LSN interlock" is
> more clear about what the immediate test is that would cause us to
> skip updating the heap page, and "being dropped or truncated later in
> recovery" is more clear about what the larger state of the world that
> would lead to that situation is. But whatever preference anyone might
> have about which way to go with that choice, it is hard to see why the
> preference should go one way in one case and the other way in another
> case. Therefore, I favor an approach that leads either to an identical
> comment in both places, or to one comment referring to the other.

I see what you are saying.

For heap_xlog_visible() the LSN interlock comment is easier to parse
because of an earlier comment before reading the heap page:

    /*
     * Read the heap page, if it still exists. If the heap file has dropped or
     * truncated later in recovery, we don't need to update the page, but we'd
     * better still update the visibility map.
     */

I've gone with the direct copy-paste of the LSN interlock paragraph in
attached v12. I think referring to the other comment is too confusing
in context here. However, I also added a line about what could cause
the LSN interlock -- but above it, so as to retain grep-ability of the
other comment.

> > > The second paragraph does not convince me at all. I see no reason to
> > > believe that this is safe, or that it is a good idea. The code in
> > > xlog_heap_visible() thinks its OK to unlock and relock the page to
> > > make visibilitymap_set() happy, which is cringy but probably safe for
> > > lack of concurrent writers, but skipping locking altogether seems
> > > deeply unwise.
> >
> > Actually in master, heap_xlog_visible() has no lock on the heap page
> > when it calls visibiltymap_set(). It releases that lock before
> > recording the freespace in the FSM and doesn't take it again.
> >
> > It does unlock and relock the VM page -- because visibilitymap_set()
> > expects to take the lock on the VM.
> >
> > I agree that not holding the heap lock while updating the VM is
> > unsatisfying. We can't hold it while doing the IO to read in the VM
> > block in XLogReadBufferForRedoExtended(). So, we could take it again
> > before calling visibilitymap_set(). But we don't always have the heap
> > buffer, though. I suspect this is partially why heap_xlog_visible()
> > unconditionally passes InvalidBuffer to visibilitymap_set() as the
> > heap buffer and has special case handling for recovery when we don't
> > have the heap buffer.
>
> You know, I wasn't thinking carefully enough about the distinction
> between the heap page and the visibility map page here. I thought you
> were saying that you were modifying a page without a lock on that
> page, but you aren't: you're saying you're modifying a page without a
> lock on another page to which it is related. The former seems
> disastrous, but the latter might be OK. However, I'm sort of confused
> about what the comment is trying to say to justify that:
>
> +        * It is only okay to set the VM bits without holding the heap page lock
> +        * because we can expect no other writers of this page.
>
> It is not exactly clear to me whether "this page" here refers to the
> heap page or the VM page. If it means the heap page, why should that
> be so if we haven't got any kind of lock? If it means the VM page,
> then why is the heap page even relevant?

I've expanded the comment in v12. In normal operation we must have the
lock on the heap page when setting the VM bits because if another
backend cleared PD_ALL_VISIBLE, we could have the forbidden scenario
where PD_ALL_VISIBLE is clear and the VM is set. This is not allowed
because then someone else may read the VM, conclude the page is
all-visible, and then an index-only scan can return wrong results. In
recovery, there are no concurrent writers, so it can't happen.

It is worth discussing how to fix it in heap_xlog_visible() so that
future scenarios like parallel recovery could not break this. However,
this patch is not a deviation from the behavior on master, and,
technically the behavior on master works.

- Melanie


Attachments:

  [text/x-patch] v12-0002-Make-heap_page_is_all_visible-independent-of-LVR.patch (5.4K, 2-v12-0002-Make-heap_page_is_all_visible-independent-of-LVR.patch)
  download | inline diff:
From 28253a5c4cb60d842f83a6f3b90bb984ffd10f89 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v12 02/20] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 48 ++++++++++++++++++----------
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 981d9380a92..8a84bdfe0a9 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,8 +463,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2009,8 +2012,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2912,8 +2916,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * emitted.
 	 */
 	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
@@ -3596,10 +3600,18 @@ dead_items_cleanup(LVRelState *vacrel)
 }
 
 /*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * Check if every tuple in the given page in buf is visible to all current and
+ * future transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * Sets *all_frozen to true if every tuple on this page is frozen.
+ *
+ * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
+ * It is only valid if the page is all-visible.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
  *
  * This is a stripped down version of lazy_scan_prune().  If you change
  * anything here, make sure that everything stays in sync.  Note that an
@@ -3607,9 +3619,11 @@ dead_items_cleanup(LVRelState *vacrel)
  * introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3632,7 +3646,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3656,9 +3670,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
 										 buf))
 		{
 			case HEAPTUPLE_LIVE:
@@ -3679,7 +3693,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+											   OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3714,7 +3728,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
-- 
2.43.0



  [text/x-patch] v12-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch (12.3K, 3-v12-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch)
  download | inline diff:
From a042cabf79da8faa583f081b432ddb955d6211bc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v12 01/20] Eliminate xl_heap_visible in COPY FREEZE

Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.

This halves the number of WAL records emitted by COPY FREEZE.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c        | 50 +++++++++++-------
 src/backend/access/heap/heapam_xlog.c   | 55 +++++++++++++++++++-
 src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c  |  5 ++
 src/include/access/visibilitymap.h      |  2 +
 5 files changed, 159 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4c5ae205a7a..cff531a4801 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
 
 		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+		{
 			all_frozen_set = true;
+			/* Lock the vmbuffer before entering the critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		}
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
@@ -2504,9 +2508,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * If the page is all visible, need to clear that, unless we're only
 		 * going to add further frozen rows to it.
-		 *
-		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2516,8 +2517,21 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+
+		/*
+		 * If we're only adding already frozen rows to a previously empty
+		 * page, mark it as all-frozen and update the visibility map. We're
+		 * already holding a pin on the vmbuffer.
+		 */
 		else if (all_frozen_set)
+		{
 			PageSetAllVisible(page);
+			visibilitymap_set_vmbits(relation,
+									 BufferGetBlockNumber(buffer),
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN);
+		}
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2579,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2627,7 +2647,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2637,26 +2660,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+			{
+				Assert(BufferIsDirty(vmbuffer));
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
+			}
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
-		 */
 		if (all_frozen_set)
-		{
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..86e5f76e49f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -662,6 +663,58 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Now read and update the VM block.
+	 *
+	 * Note that the heap relation may have been dropped or truncated, leading
+	 * us to skip updating the heap block.
+	 *
+	 * Even if we skipped the heap page update due to the LSN interlock, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * Note that the lock on the heap page was dropped above. In normal
+	 * operation this would never be safe because a concurrent query could
+	 * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+	 * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+	 * the VM is set.
+	 *
+	 * In recovery, we expect no other writers, so writing to the VM page
+	 * without holding a lock on the heap page is considered safe enough. It
+	 * is done this way when replaying xl_heap_visible records (see
+	 * heap_xlog_visible()).
+	 */
+	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		visibilitymap_set_vmbits(reln, blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN);
+
+		/*
+		 * It is not possible that the VM was already set for this heap page,
+		 * so the vmbuffer must have been modified and marked dirty.
+		 */
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..aa48a436108 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
+ *		visibilitymap_set_vmbits - set bit(s) in a pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+		 flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+	/* Call in same critical section where WAL is emitted. */
+	Assert(InRecovery || CritSectionCount > 0);
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	Assert(BufferIsExclusiveLocked(vmBuf));
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..fc7056a91ea 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v12-0005-Combine-lazy_scan_prune-VM-corruption-cases.patch (7.4K, 4-v12-0005-Combine-lazy_scan_prune-VM-corruption-cases.patch)
  download | inline diff:
From e5c62e789cc7757e61543aa628877f9bcab4dcac Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v12 05/20] Combine lazy_scan_prune VM corruption cases

lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.

Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.

This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
 src/backend/access/heap/vacuumlazy.c | 126 +++++++++++++++++----------
 1 file changed, 79 insertions(+), 47 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a1cdaaebb57..e9b4e924d22 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,6 +430,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1934,6 +1940,72 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
+
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2080,9 +2152,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * all_frozen variables. Start by looking for any VM corruption.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	{
+		/* Don't update the VM if we just cleared corruption in it */
+	}
+	else if (!all_visible_according_to_vm && presult.all_visible)
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2134,51 +2211,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-- 
2.43.0



  [text/x-patch] v12-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch (28.7K, 5-v12-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch)
  download | inline diff:
From 684e2b681adfee93d5155cb77df3062188ae2dbc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v12 03/20] Eliminate xl_heap_visible from vacuum phase III

Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.

The visibilitymap bits are stored in the flags member of the
xl_heap_prune record.

This can decrease the number of of WAL records vacuum phase III emits by
as much as half.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c  | 156 +++++++++++++++++++----
 src/backend/access/heap/pruneheap.c    |  66 ++++++++--
 src/backend/access/heap/vacuumlazy.c   | 166 +++++++++++++++++--------
 src/backend/access/rmgrdesc/heapdesc.c |   6 +-
 src/include/access/heapam.h            |   9 ++
 src/include/access/heapam_xlog.h       |  36 ++++--
 6 files changed, 343 insertions(+), 96 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 86e5f76e49f..5872f13397f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
+	Size		freespace = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,17 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
+	vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. That means we
+	 * can't replay this record if it removes tuples that are still visible to
+	 * transactions on the standby, freeze tuples with xids that are still
+	 * considered running on the standby, or set a page as all-visible in the
+	 * VM if it isn't all-visible to all transactions on the standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -71,12 +79,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	}
 
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -89,6 +97,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Size		datalen;
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
+		bool		do_prune;
+		bool		mark_buffer_dirty;
+		bool		set_heap_lsn;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +108,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+		set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 ||
+			   vmflags & VISIBILITYMAP_VALID_BITS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,36 +156,128 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		/*
+		 * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+		 * also going to set bits in the VM later.
+		 *
+		 * We must never end up with the VM bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+		 * modification would fail to clear the VM bit.
+		 */
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+
+			/*
+			 * If the only change to the heap page is setting PD_ALL_VISIBLE,
+			 * we can avoid setting the page LSN unless checksums or
+			 * wal_log_hints are enabled.
+			 */
+			set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+			mark_buffer_dirty = true;
+		}
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
 
-		PageSetLSN(page, lsn);
-		MarkBufferDirty(buffer);
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+		if (set_heap_lsn)
+			PageSetLSN(page, lsn);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or will be setting a page in
+	 * the visibility map, measure the page's freespace to later update the
+	 * freespace map.
+	 *
+	 * Even if we are just updating the VM (and thus not freeing up any
+	 * space), we'll still update the FSM for this page. Since FSM is not
+	 * WAL-logged and only updated heuristically, it easily becomes stale in
+	 * standbys.  If the standby is later promoted and runs VACUUM, it will
+	 * skip updating individual free space figures for pages that became
+	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
+	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+	 * space values to upper FSM layers; later inserters try to use such pages
+	 * only to find out that they are unusable.  This can cause long stalls
+	 * when there are many such pages.
+	 *
+	 * Forestall those problems by updating FSM's idea about a page that is
+	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
+	 *
+	 * We want to avoid holding an exclusive lock on the heap buffer while
+	 * doing IO (either of the FSM or the VM), so we'll release the lock on
+	 * the heap buffer before doing either.
 	 */
 	if (BufferIsValid(buffer))
 	{
-		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
-						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
-		{
-			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+		if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+							XLHP_HAS_DEAD_ITEMS |
+							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+			vmflags & VISIBILITYMAP_VALID_BITS)
+			freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+
+		UnlockReleaseBuffer(buffer);
+	}
+
+	/*
+	 * Now read and update the VM block.
+	 *
+	 * Note that the heap relation may have been dropped or truncated, leading
+	 * us to skip updating the heap block.
+	 *
+	 * Even if we skipped the heap page update due to the LSN interlock, it's
+	 * still safe to update the visibility map.  Any WAL record that clears
+	 * the visibility map bit does so before checking the page LSN, so any
+	 * bits that need to be cleared will still be cleared.
+	 *
+	 * Note that the lock on the heap page was dropped above. In normal
+	 * operation this would never be safe because a concurrent query could
+	 * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+	 * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+	 * the VM is set.
+	 *
+	 * In recovery, we expect no other writers, so writing to the VM page
+	 * without holding a lock on the heap page is considered safe enough. It
+	 * is done this way when replaying xl_heap_visible records (see
+	 * heap_xlog_visible()).
+	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		uint8		old_vmbits = 0;
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
 
-			UnlockReleaseBuffer(buffer);
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
 
-			XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+		old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+		{
+			Assert(BufferIsDirty(vmbuffer));
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		}
-		else
-			UnlockReleaseBuffer(buffer);
+
+		FreeFakeRelcacheEntry(reln);
 	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
+	if (freespace > 0)
+		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7ebd22f00a3..f0b33d1b696 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -2030,14 +2032,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phase III, the heap page may be marked
+ *   all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a singel WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2045,12 +2051,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer,
+						  uint8 vmflags,
+						  bool set_pd_all_vis,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
 						  PruneReason reason,
@@ -2062,6 +2079,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2088,21 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
+	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+	xlrec.flags = vmflags;
 
-	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2079,7 +2110,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2168,5 +2203,22 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+	{
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+	}
+
+	/*
+	 * If pruning or freezing tuples or setting the page all-visible when
+	 * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+	 * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+	 * but this is deemed okay for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+	{
+		Assert(BufferIsDirty(buffer));
+		PageSetLSN(BufferGetPage(buffer), recptr);
+	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a84bdfe0a9..51067264004 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2852,8 +2854,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
+	bool		set_pd_all_vis = false;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2864,6 +2869,23 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+									   vacrel->cutoffs.OldestXmin,
+									   deadoffsets, num_offsets,
+									   &all_frozen, &visibility_cutoff_xid,
+									   &vacrel->offnum))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+
+		/* Take the lock on the vmbuffer before entering a critical section */
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2883,6 +2905,17 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
+		Assert(!PageIsAllVisible(page));
+		set_pd_all_vis = true;
+		PageSetAllVisible(page);
+		visibilitymap_set_vmbits(vacrel->rel,
+								 blkno,
+								 vmbuffer, vmflags);
+		conflict_xid = visibility_cutoff_xid;
+	}
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2892,7 +2925,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  vmbuffer,
+								  vmflags,
+								  set_pd_all_vis,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
@@ -2901,39 +2937,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
-	 */
 	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
-	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
-								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  flags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
@@ -3600,40 +3609,85 @@ dead_items_cleanup(LVRelState *vacrel)
 }
 
 /*
- * Check if every tuple in the given page in buf is visible to all current and
- * future transactions.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+
+/*
+ * Determines whether or not the heap page in buf is all-visible other than
+ * the dead line pointers referred to by the provided deadoffsets array.
  *
- * OldestXmin is used to determine visibility.
+ * deadoffsets are the offsets the caller knows about and already removed
+ * associated index entries. Vacuum will call this before setting those line
+ * pointers LP_UNUSED. So, if there are no new LP_DEAD items, then the page
+ * can be set all-visible in the VM by the caller.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
  *
- * Sets *all_frozen to true if every tuple on this page is frozen.
+ * OldestXmin is used to determine visibility.
  *
- * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
- * It is only valid if the page is all-visible.
+ * *all_frozen is an output parameter indicating to the caller if every tuple
+ * on the page is frozen.
  *
  * *logging_offnum will have the OffsetNumber of the current tuple being
  * processed for vacuum's error callback system.
  *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
+ * visible tuples. It is only valid if the page is all-visible.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+							   TransactionId OldestXmin,
+							   OffsetNumber *deadoffsets,
+							   int ndeadoffsets,
+							   bool *all_frozen,
+							   TransactionId *visibility_cutoff_xid,
+							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	int			matched_dead_count = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
 
+	Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+	/* Confirm input deadoffsets[] is strictly sorted */
+	if (ndeadoffsets > 1)
+	{
+		for (int i = 1; i < ndeadoffsets; i++)
+			Assert(deadoffsets[i - 1] < deadoffsets[i]);
+	}
+#endif
+
 	maxoff = PageGetMaxOffsetNumber(page);
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff && all_visible;
@@ -3661,9 +3715,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			if (!deadoffsets ||
+				matched_dead_count >= ndeadoffsets ||
+				deadoffsets[matched_dead_count] != offnum)
+			{
+				*all_frozen = all_visible = false;
+				break;
+			}
+			matched_dead_count++;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..439f33b8061 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
  * code, the latter of which is used in frontend (pg_waldump) code.
  */
 void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 									   int *nplans, xlhp_freeze_plan **plans,
 									   OffsetNumber **frz_offsets,
 									   int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 xlrec->flags & VISIBILITYMAP_VALID_BITS);
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
 										   Buffer buffer);
 extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer,
+									  uint8 vmflags,
+									  bool vm_modified_heap_page,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
 									  PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..d8508593e7c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
  * Main data section:
  *
  *	xl_heap_prune
- *		uint8				flags
+ *		uint16				flags
  *	TransactionId			snapshot_conflict_horizon
  *
  * Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
  */
 typedef struct xl_heap_prune
 {
-	uint8		flags;
+	uint16		flags;
 
 	/*
 	 * If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,10 +292,22 @@ typedef struct xl_heap_prune
 	 */
 } xl_heap_prune;
 
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
+
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set. As
+ * such, (1 << 0) and (1 << 1) are reserved for VISIBILITYMAP_ALL_VISIBLE and
+ * VISIBILITYMAP_ALL_FROZEN.
+ */
 
-/* to handle recovery conflict during logical decoding on standby */
-#define		XLHP_IS_CATALOG_REL			(1 << 1)
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
+#define		XLHP_IS_CATALOG_REL			(1 << 2)
 
 /*
  * Does replaying the record require a cleanup-lock?
@@ -305,7 +317,7 @@ typedef struct xl_heap_prune
  * marks LP_DEAD line pointers as unused without moving any tuple data, an
  * ordinary exclusive lock is sufficient.
  */
-#define		XLHP_CLEANUP_LOCK	       (1 << 2)
+#define		XLHP_CLEANUP_LOCK	       (1 << 3)
 
 /*
  * If we remove or freeze any entries that contain xids, we need to include a
@@ -313,22 +325,22 @@ typedef struct xl_heap_prune
  * there are no queries running for which the removed tuples are still
  * visible, or which still consider the frozen XIDs as running.
  */
-#define		XLHP_HAS_CONFLICT_HORIZON   (1 << 3)
+#define		XLHP_HAS_CONFLICT_HORIZON   (1 << 4)
 
 /*
  * Indicates that an xlhp_freeze_plans sub-record and one or more
  * xlhp_freeze_plan sub-records are present.
  */
-#define		XLHP_HAS_FREEZE_PLANS		(1 << 4)
+#define		XLHP_HAS_FREEZE_PLANS		(1 << 5)
 
 /*
  * XLHP_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, and XLHP_HAS_NOW_UNUSED_ITEMS
  * indicate that xlhp_prune_items sub-records with redirected, dead, and
  * unused item offsets are present.
  */
-#define		XLHP_HAS_REDIRECTIONS		(1 << 5)
-#define		XLHP_HAS_DEAD_ITEMS	        (1 << 6)
-#define		XLHP_HAS_NOW_UNUSED_ITEMS   (1 << 7)
+#define		XLHP_HAS_REDIRECTIONS		(1 << 6)
+#define		XLHP_HAS_DEAD_ITEMS	        (1 << 7)
+#define		XLHP_HAS_NOW_UNUSED_ITEMS   (1 << 8)
 
 /*
  * xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
@@ -497,7 +509,7 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
 								   uint8 vmflags);
 
 /* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
 												   OffsetNumber **frz_offsets,
 												   int *nredirected, OffsetNumber **redirected,
-- 
2.43.0



  [text/x-patch] v12-0004-Use-xl_heap_prune-record-for-setting-empty-pages.patch (5.9K, 6-v12-0004-Use-xl_heap_prune-record-for-setting-empty-pages.patch)
  download | inline diff:
From 3d979ac727c20e964a77552c7a5f06f45a7aff7c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v12 04/20] Use xl_heap_prune record for setting empty pages
 all-visible

As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
 src/backend/access/heap/pruneheap.c  | 14 +++++--
 src/backend/access/heap/vacuumlazy.c | 57 ++++++++++++++++++----------
 src/include/access/heapam.h          |  1 +
 3 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f0b33d1b696..373986b204a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  false,
 									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
@@ -2055,6 +2056,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
  * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
  * the page LSN when checksums/wal_log_hints are enabled even if we did not
  * prune or freeze tuples on the page.
@@ -2065,6 +2069,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  bool force_heap_fpi,
 						  Buffer vmbuffer,
 						  uint8 vmflags,
 						  bool set_pd_all_vis,
@@ -2095,13 +2100,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 
 	regbuf_flags = REGBUF_STANDARD;
 
+	if (force_heap_fpi)
+		regbuf_flags |= REGBUF_FORCE_IMAGE;
+
 	/*
 	 * We can avoid an FPI if the only modification we are making to the heap
 	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+	else if (!do_prune &&
+			 nfrozen == 0 &&
+			 (!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 51067264004..a1cdaaebb57 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1877,33 +1877,49 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+				VISIBILITYMAP_ALL_FROZEN;
+
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
-			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				PageGetLSN(page) == InvalidXLogRecPtr)
-				log_newpage_buffer(buf, true);
+			visibilitymap_set_vmbits(vacrel->rel, blkno,
+									 vmbuffer, new_vmbits);
+
+			if (RelationNeedsWAL(vacrel->rel))
+			{
+				/*
+				 * It's possible that another backend has extended the heap,
+				 * initialized the page, and then failed to WAL-log the page
+				 * due to an ERROR.  Since heap extension is not WAL-logged,
+				 * recovery might try to replay our record setting the page
+				 * all-visible and find that the page isn't initialized, which
+				 * will cause a PANIC. To prevent that, if the page hasn't
+				 * been previously WAL-logged, force a heap FPI.
+				 */
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  PageGetLSN(page) == InvalidXLogRecPtr,
+										  vmbuffer,
+										  new_vmbits,
+										  true,
+										  InvalidTransactionId,
+										  false, PRUNE_VACUUM_SCAN,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+			}
 
-			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
-			/* Count the newly all-frozen pages for logging */
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+			/* Count the newly all-frozen pages for logging. */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
 		}
@@ -2925,6 +2941,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
+								  false,
 								  vmbuffer,
 								  vmflags,
 								  set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  bool force_heap_fpi,
 									  Buffer vmbuffer,
 									  uint8 vmflags,
 									  bool vm_modified_heap_page,
-- 
2.43.0



  [text/x-patch] v12-0007-Find-and-fix-VM-corruption-in-heap_page_prune_an.patch (12.1K, 7-v12-0007-Find-and-fix-VM-corruption-in-heap_page_prune_an.patch)
  download | inline diff:
From 133a81c37921fc4a13e85795dc3a6f39726b0254 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v12 07/20] Find and fix VM corruption in
 heap_page_prune_and_freeze

Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().

This commit is only really meant for review, as it adds a member to
PruneFreezeResult (vm_corruption) that is removed in later commits.
---
 src/backend/access/heap/pruneheap.c  | 93 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 84 +++----------------------
 src/include/access/heapam.h          |  4 ++
 3 files changed, 102 insertions(+), 79 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 373986b204a..54af3296b91 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
+
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+			heap_page_prune_and_freeze(relation, buffer, false,
+									   InvalidBuffer,
+									   vistest, 0,
 									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -294,6 +303,70 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +387,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
  * (see heap_prune_satisfies_vacuum).
  *
@@ -349,6 +426,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   bool blk_known_av,
+						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
 						   int options,
 						   struct VacuumCutoffs *cutoffs,
@@ -897,6 +976,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	/*
+	 * Clear any VM corruption. This does not need to be done in a critical
+	 * section.
+	 */
+	presult->vm_corruption = false;
+	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+																blockno, buffer, page,
+																blk_known_av,
+																prstate.lpdead_items, vmbuffer);
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1febb524d41..574e415b0e0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
-										   BlockNumber heap_blk,
-										   Buffer heap_buffer, Page heap_page,
-										   bool heap_blk_known_av,
-										   int64 nlpdead_items,
-										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1940,72 +1934,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
-							   BlockNumber heap_blk,
-							   Buffer heap_buffer, Page heap_page,
-							   bool heap_blk_known_av,
-							   int64 nlpdead_items,
-							   Buffer vmbuffer)
-{
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
-		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(relation), heap_blk)));
-
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(relation), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	return false;
-}
-
-
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2063,11 +1991,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+	heap_page_prune_and_freeze(rel, buf,
+							   all_visible_according_to_vm,
+							   vmbuffer,
+							   vacrel->vistest, prune_options,
 							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2152,10 +2083,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables. Start by looking for any VM corruption.
+	 * all_frozen variables.
 	 */
-	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
-									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	if (presult.vm_corruption)
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
+	bool		vm_corruption;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   bool blk_known_av,
+									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
 									   int options,
 									   struct VacuumCutoffs *cutoffs,
-- 
2.43.0



  [text/x-patch] v12-0006-Combine-vacuum-phase-I-VM-update-cases.patch (5.8K, 8-v12-0006-Combine-vacuum-phase-I-VM-update-cases.patch)
  download | inline diff:
From f73b16cbea6a580ed7cf0c72c37c9a3251fa4cf4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v12 06/20] Combine vacuum phase I VM update cases

We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.

Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.

The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
 src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
 1 file changed, 32 insertions(+), 69 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e9b4e924d22..1febb524d41 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2159,11 +2159,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
-	else if (!all_visible_according_to_vm && presult.all_visible)
+
+	/*
+	 * If the page isn't yet marked all-visible in the VM or it is and needs
+	 * to me marked all-frozen, update the VM. Note that all_frozen is only
+	 * valid if all_visible is true, so we must check both all_visible and
+	 * all_frozen.
+	 */
+	else if (presult.all_visible &&
+			 (!all_visible_according_to_vm ||
+			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
+		/*
+		 * If the page is all-frozen, we can pass InvalidTransactionId as our
+		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+		 * everything safe for REDO was logged when the page's tuples were
+		 * frozen.
+		 */
 		if (presult.all_frozen)
 		{
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2176,21 +2191,29 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * If the heap page is all-visible but the VM bit is not set, we don't
+		 * need to dirty the heap page.  However, if checksums are enabled, we
+		 * do need to make sure that the heap page is dirtied before passing
+		 * it to visibilitymap_set(), because it may be logged.
 		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
+		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
 									   flags);
 
 		/*
+		 * Even if we are only setting the all-frozen bit, there is a small
+		 * chance that the VM was modified sometime between setting
+		 * all_visible_according_to_vm and checking the visibility during
+		 * pruning. Check the return value of old_vmbits to ensure the
+		 * visibility map counters used for logging are accurate.
+		 *
 		 * If the page wasn't already set all-visible and/or all-frozen in the
 		 * VM, count it as newly set for logging.
 		 */
@@ -2211,66 +2234,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v12-0010-Rename-PruneState.freeze-to-attempt_freeze.patch (3.7K, 9-v12-0010-Rename-PruneState.freeze-to-attempt_freeze.patch)
  download | inline diff:
From 8277be755b187e66a57bfff15a2d46f98656f4ca Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v12 10/20] Rename PruneState.freeze to attempt_freeze

This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.

Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
 src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 398962ed1cb..df3e6439176 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
 	/* whether or not dead items can be set LP_UNUSED during pruning */
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
-	bool		freeze;
+	bool		attempt_freeze;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -445,13 +445,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_hint;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
-	bool		hint_bit_fpi;
+	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
 	prstate.cutoffs = cutoffs;
 
 	/*
@@ -473,7 +473,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/* initialize page freezing working state */
 	prstate.pagefrz.freeze_required = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
 		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -520,7 +520,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * function, when we return the value to the caller, so that the caller
 	 * doesn't set the VM bit incorrectly.
 	 */
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -634,7 +634,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
 	 * an FPI to be emitted.
 	 */
-	hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	/*
 	 * Process HOT chains.
@@ -750,7 +750,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * plans we prepared, or not.
 	 */
 	do_freeze = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (prstate.pagefrz.freeze_required)
 		{
@@ -783,7 +783,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				 */
 				if (RelationNeedsWAL(relation))
 				{
-					if (hint_bit_fpi)
+					if (did_tuple_hint_fpi)
 						do_freeze = true;
 					else if (do_prune)
 					{
@@ -1046,7 +1046,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->old_vmbits = old_vmbits;
 	presult->new_vmbits = vmflags;
 
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
 		{
@@ -1628,7 +1628,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 	}
 
 	/* Consider freezing any normal tuples which will not be removed */
-	if (prstate->freeze)
+	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
 
-- 
2.43.0



  [text/x-patch] v12-0008-Keep-all_frozen-updated-too-in-heap_page_prune_a.patch (3.1K, 10-v12-0008-Keep-all_frozen-updated-too-in-heap_page_prune_a.patch)
  download | inline diff:
From 6eecf49c63134c561082dc6a85fdb35d752aea53 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v12 08/20] Keep all_frozen updated too in
 heap_page_prune_and_freeze

We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.

Future commits will separate usage of these fields, so it is best not to
rely on all_visible for all_frozen's validity.
---
 src/backend/access/heap/pruneheap.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 54af3296b91..bbd83e4fcc7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
 	 * whether to freeze the page or not.  The all_visible and all_frozen
 	 * values returned to the caller are adjusted to include LP_DEAD items at
 	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -830,6 +826,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -1474,7 +1471,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1496,7 +1493,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1509,7 +1506,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1528,7 +1525,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1546,7 +1543,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
-- 
2.43.0



  [text/x-patch] v12-0009-Update-VM-in-pruneheap.c.patch (12.5K, 11-v12-0009-Update-VM-in-pruneheap.c.patch)
  download | inline diff:
From 2e2fc840a3af65fec6eee2a8eb2de30839a8ca52 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v12 09/20] Update VM in pruneheap.c

As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
 src/backend/access/heap/pruneheap.c  | 99 +++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 98 +++++----------------------
 src/include/access/heapam.h          | 15 +++--
 3 files changed, 106 insertions(+), 106 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index bbd83e4fcc7..398962ed1cb 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -366,7 +366,8 @@ identify_and_fix_vm_corruption(Relation relation,
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -442,6 +443,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint;
+	uint8		vmflags = 0;
+	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -942,7 +945,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 *
 	 * Now that freezing has been finalized, unset all_visible if there are
 	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
+	 * of the page, as expected for updating the visibility map.
 	 */
 	if (prstate.all_visible && prstate.lpdead_items == 0)
 	{
@@ -958,31 +961,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->hastup = prstate.hastup;
 
 	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so the VM update record doesn't need it.
 	 */
 	if (presult->all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
 		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
 
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
 	/*
-	 * Clear any VM corruption. This does not need to be done in a critical
-	 * section.
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
 	 */
-	presult->vm_corruption = false;
 	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
-		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
-																blockno, buffer, page,
-																blk_known_av,
-																prstate.lpdead_items, vmbuffer);
+	{
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av,
+										   prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/*
+		 * If the page isn't yet marked all-visible in the VM or it is and
+		 * needs to me marked all-frozen, update the VM. Note that all_frozen
+		 * is only valid if all_visible is true, so we must check both
+		 * all_visible and all_frozen.
+		 */
+		else if (presult->all_visible &&
+				 (!blk_known_av ||
+				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			Assert(prstate.lpdead_items == 0);
+			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			/*
+			 * If the page is all-frozen, we can pass InvalidTransactionId as
+			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+			 * make everything safe for REDO was logged when the page's tuples
+			 * were frozen.
+			 */
+			if (presult->all_frozen)
+			{
+				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			}
+
+			/*
+			 * It's possible for the VM bit to be clear and the page-level bit
+			 * to be set if checksums are not enabled.
+			 *
+			 * And even if we are just planning to update the frozen bit in
+			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
+			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+			 * might have become stale.
+			 *
+			 * If the heap page is all-visible but the VM bit is not set, we
+			 * don't need to dirty the heap page.  However, if checksums are
+			 * enabled, we do need to make sure that the heap page is dirtied
+			 * before passing it to visibilitymap_set(), because it may be
+			 * logged.
+			 */
+			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+			{
+				PageSetAllVisible(page);
+				MarkBufferDirty(buffer);
+			}
+
+			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+										   vmbuffer, presult->vm_conflict_horizon,
+										   vmflags);
+		}
+	}
+
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
+
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = vmflags;
+
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 574e415b0e0..9492423141e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1949,7 +1949,8 @@ cmpOffsetNumbers(const void *a, const void *b)
  * vmbuffer is the buffer containing the VM block with visibility information
  * for the heap block, blkno. all_visible_according_to_vm is the saved
  * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1978,6 +1979,7 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
+	 * Then, if the page's visibility status has changed, update the VM.
 	 *
 	 * If the relation has no indexes, we can immediately mark would-be dead
 	 * items LP_UNUSED.
@@ -1986,10 +1988,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * presult.ndeleted.  It should not be confused with presult.lpdead_items;
 	 * presult.lpdead_items's final value can be thought of as the number of
 	 * tuples that were deleted from indexes.
-	 *
-	 * We will update the VM after collecting LP_DEAD items and freezing
-	 * tuples. Pruning will have determined whether or not the page is
-	 * all-visible.
 	 */
 	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
@@ -2081,88 +2079,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (presult.vm_corruption)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		/* Don't update the VM if we just cleared corruption in it */
-	}
-
-	/*
-	 * If the page isn't yet marked all-visible in the VM or it is and needs
-	 * to me marked all-frozen, update the VM. Note that all_frozen is only
-	 * valid if all_visible is true, so we must check both all_visible and
-	 * all_frozen.
-	 */
-	else if (presult.all_visible &&
-			 (!all_visible_according_to_vm ||
-			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		/*
-		 * If the page is all-frozen, we can pass InvalidTransactionId as our
-		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
-		 * everything safe for REDO was logged when the page's tuples were
-		 * frozen.
-		 */
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * If the heap page is all-visible but the VM bit is not set, we don't
-		 * need to dirty the heap page.  However, if checksums are enabled, we
-		 * do need to make sure that the heap page is dirtied before passing
-		 * it to visibilitymap_set(), because it may be logged.
-		 */
-		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * Even if we are only setting the all-frozen bit, there is a small
-		 * chance that the VM was modified sometime between setting
-		 * all_visible_according_to_vm and checking the visibility during
-		 * pruning. Check the return value of old_vmbits to ensure the
-		 * visibility map counters used for logging are accurate.
-		 *
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
+	}
 
 	return presult.ndeleted;
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
+	 * all_visible and all_frozen indicate the status of the page as reflected
+	 * in the visibility map after pruning, freezing, and setting any pages
+	 * all-visible in the visibility map.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * vm_conflict_horizon is the newest xmin of live tuples on the page
+	 * (older than OldestXmin).  It will only be valid if we did not set the
+	 * page all-frozen in the VM.
 	 *
 	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
-	bool		vm_corruption;
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
-- 
2.43.0



  [text/x-patch] v12-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patch (29.2K, 12-v12-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patch)
  download | inline diff:
From 73418906b5aca553da545d77c1f0d29cd3d2f0b4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v12 11/20] Eliminate xl_heap_visible from vacuum phase I
 prune/freeze

Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.

This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
 src/backend/access/heap/pruneheap.c  | 459 ++++++++++++++++-----------
 src/backend/access/heap/vacuumlazy.c |  30 --
 src/include/access/heapam.h          |  15 +-
 3 files changed, 282 insertions(+), 222 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index df3e6439176..dce9025d268 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+
+	/*
+	 * Whether or not to consider updating the VM. There is some bookkeeping
+	 * that must be maintained if we would like to update the VM.
+	 */
+	bool		consider_update_vm;
+
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
 	 *
 	 * These fields are not used by pruning itself for the most part, but are
 	 * used to collect information about what was pruned and what state the
-	 * page is in after pruning, for the benefit of the caller.  They are
-	 * copied to the caller's PruneFreezeResult at the end.
+	 * page is in after pruning to use when updating the visibility map and
+	 * for the benefit of the caller.  They are copied to the caller's
+	 * PruneFreezeResult at the end.
 	 * -------------------------------------------------------
 	 */
 
@@ -138,11 +146,10 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+	 * directly before updating the VM. We ignore LP_DEAD items when deciding
+	 * whether or not to opportunistically freeze and when determining the
+	 * snapshot conflict horizon required when freezing tuples.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -377,12 +384,15 @@ identify_and_fix_vm_corruption(Relation relation,
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
  *
  * blk_known_av is the visibility status of the heap block as of the last call
  * to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -398,6 +408,8 @@ identify_and_fix_vm_corruption(Relation relation,
  *   FREEZE indicates that we will also freeze tuples, and will return
  *   'all_visible', 'all_frozen' flags to the caller.
  *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -442,18 +454,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	HeapTupleData tup;
 	bool		do_freeze;
 	bool		do_prune;
-	bool		do_hint;
+	bool		do_hint_full_or_prunable;
+	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	bool		all_frozen_except_lp_dead = false;
+	bool		set_pd_all_visible = false;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
+	Assert(!prstate.consider_update_vm || vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -498,50 +516,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Keep track of whether or not the page will be all-visible and
+	 * all-frozen for use in opportunistic freezing and to update the VM if
+	 * the caller requests it.
+	 *
+	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
+	 * other callers could do either one. The visibility bookkeeping is
+	 * required for opportunistic freezing (in addition to setting the VM
+	 * bits) because we only consider opportunistically freezing tuples if the
+	 * whole page would become all-frozen or if the whole page will be frozen
+	 * except for dead tuples that will be removed by vacuum.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * If only updating the VM, we must initialize all_frozen to false, as
+	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
+	 * page and we will not end up correctly setting it to false later.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * Dead tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing, so we do not clear
+	 * all_visible when we see LP_DEAD items. We fix that after determining
+	 * whether or not to freeze but before deciding whether or not to update
+	 * the VM so that we don't set the VM bit incorrectly.
+	 *
+	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * bookkeeping. Initializing all_visible to false allows skipping the work
+	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
 	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if (prstate.consider_update_vm)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon.
+	 * This is most likely to happen when updating the VM and/or freezing all
+	 * live tuples on the page. It is updated before returning to the caller
+	 * because vacuum does assert-build only validation on the page using this
+	 * field.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -739,10 +764,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
+	 * pd_prune_xid field or the page was marked full, we will update those
+	 * hint bits.
 	 */
-	do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_full_or_prunable =
+		((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
 	/*
@@ -790,7 +816,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 						if (XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
 					}
-					else if (do_hint)
+					else if (do_hint_full_or_prunable)
 					{
 						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
@@ -829,11 +855,88 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	all_frozen_except_lp_dead = prstate.all_frozen;
+	if (prstate.lpdead_items > 0)
+	{
+		prstate.all_visible = false;
+		prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
+	 */
+	if (prstate.consider_update_vm)
+	{
+		/*
+		 * Clear any VM corruption. This does not need to be in a critical
+		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+		 * we may mark the heap page buffer dirty here and could end up doing
+		 * so again later. This is not a correctness issue and is in the path
+		 * of VM corruption, so we don't have to worry about the extra
+		 * performance overhead.
+		 */
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av, prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/* Determine if we actually need to set the VM and which bits to set. */
+		else if (prstate.all_visible &&
+				 (!blk_known_av ||
+				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+			if (prstate.all_frozen)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+	}
+
+	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+	/* Save these for the caller in case we later zero out vmflags */
+	presult->new_vmbits = vmflags;
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
-	if (do_hint)
+	if (do_hint_full_or_prunable)
 	{
 		/*
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
@@ -849,15 +952,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageClearFull(page);
 
 		/*
-		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+		 * this is a non-WAL-logged hint.  If we are going to freeze or prune
+		 * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+		 * dirty and emit WAL below.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_prune && !do_freeze && !set_pd_all_visible)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -871,12 +975,47 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (set_pd_all_visible)
+			PageSetAllVisible(page);
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+		 * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+		 * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+		 * appropriate here.
 		 */
-		if (RelationNeedsWAL(relation))
+		if (do_prune || do_freeze || set_pd_all_visible)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			Assert(PageIsAllVisible(page));
+			old_vmbits = visibilitymap_set_vmbits(relation, blockno,
+												  vmbuffer, vmflags);
+
+			if (old_vmbits == vmflags)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				do_set_vm = false;
+				/* 0 out vmflags so we don't emit WAL to update the VM */
+				vmflags = 0;
+			}
+		}
+
+		/*
+		 * It should never be the case that PD_ALL_VISIBLE is not set and the
+		 * VM is set. Or, if it were, we should have caught it earlier when
+		 * finding and fixing VM corruption. So, if we found out the VM was
+		 * already set above, we should have found PD_ALL_VISIBLE set earlier.
+		 */
+		Assert(!set_pd_all_visible || do_set_vm);
+
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+		 * we were only updating the VM and it turns out it was already set,
+		 * we will have unset do_set_vm earlier. As such, check it again
+		 * before emitting the record.
+		 */
+		if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -888,35 +1027,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
-			TransactionId conflict_xid;
+			TransactionId conflict_xid = InvalidTransactionId;
+
+			/*
+			 * If we are updating the VM, the conflict horizon is almost
+			 * always the visibility cutoff XID.
+			 *
+			 * Separately, if we are freezing any tuples, as an optimization,
+			 * we can use the visibility_cutoff_xid as the conflict horizon if
+			 * the page will be all-frozen. This is true even if there are
+			 * LP_DEAD line pointers because we ignored those when maintaining
+			 * the visibility_cutoff_xid.
+			 */
+			if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+				conflict_xid = prstate.visibility_cutoff_xid;
 
 			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
+			 * Otherwise, if we are freezing but the page would not be
+			 * all-frozen, we have to use the more pessimistic horizon of
+			 * OldestXmin, which may be newer than the newest tuple we froze.
+			 * We currently don't track the newest tuple we froze.
 			 */
-			if (do_freeze)
+			else if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
+				conflict_xid = prstate.cutoffs->OldestXmin;
+				TransactionIdRetreat(conflict_xid);
 			}
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
+			/*
+			 * If we are removing tuples with a younger xmax than our so far
+			 * calculated conflict_xid, we must use this as our horizon.
+			 */
+			if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
 				conflict_xid = prstate.latest_xid_removed;
 
+			/*
+			 * We can omit the snapshot conflict horizon if we are not pruning
+			 * or freezing any tuples and are setting an already all-visible
+			 * page all-frozen in the VM. In this case, all of the tuples on
+			 * the page must already be visible to all MVCC snapshots on the
+			 * standby.
+			 */
+			if (!do_prune && !do_freeze && do_set_vm &&
+				blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+				conflict_xid = InvalidTransactionId;
+
 			log_heap_prune_and_freeze(relation, buffer,
 									  false,
-									  InvalidBuffer, 0, false,
+									  vmbuffer,
+									  vmflags,
+									  set_pd_all_visible,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -928,124 +1088,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected for updating the visibility map.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
-	presult->hastup = prstate.hastup;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so the VM update record doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * VACUUM will call heap_page_is_all_visible() during the second pass over
+	 * the heap to determine all_visible and all_frozen for the page -- this
+	 * is a specialized version of the logic from this function.  Now that
+	 * we've finished pruning and freezing, make sure that we're in total
+	 * agreement with heap_page_is_all_visible() using an assertion. We will
+	 * have already set the page in the VM, so this assertion will only let
+	 * you know that you've already done something wrong.
 	 */
-	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
 	{
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av,
-										   prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		/*
-		 * If the page isn't yet marked all-visible in the VM or it is and
-		 * needs to me marked all-frozen, update the VM. Note that all_frozen
-		 * is only valid if all_visible is true, so we must check both
-		 * all_visible and all_frozen.
-		 */
-		else if (presult->all_visible &&
-				 (!blk_known_av ||
-				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			Assert(prstate.lpdead_items == 0);
-			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		Assert(cutoffs);
 
-			/*
-			 * If the page is all-frozen, we can pass InvalidTransactionId as
-			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
-			 * make everything safe for REDO was logged when the page's tuples
-			 * were frozen.
-			 */
-			if (presult->all_frozen)
-			{
-				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			}
+		Assert(prstate.lpdead_items == 0);
 
-			/*
-			 * It's possible for the VM bit to be clear and the page-level bit
-			 * to be set if checksums are not enabled.
-			 *
-			 * And even if we are just planning to update the frozen bit in
-			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
-			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
-			 * might have become stale.
-			 *
-			 * If the heap page is all-visible but the VM bit is not set, we
-			 * don't need to dirty the heap page.  However, if checksums are
-			 * enabled, we do need to make sure that the heap page is dirtied
-			 * before passing it to visibilitymap_set(), because it may be
-			 * logged.
-			 */
-			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buffer);
-			}
+		if (!heap_page_is_all_visible(relation, buffer,
+									  cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
 
-			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
-										   vmbuffer, presult->vm_conflict_horizon,
-										   vmflags);
-		}
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
+#endif
 
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->old_vmbits = old_vmbits;
+	/* new_vmbits was set above */
+	presult->hastup = prstate.hastup;
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = vmflags;
-
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
@@ -1627,7 +1718,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			break;
 	}
 
-	/* Consider freezing any normal tuples which will not be removed */
+	/*
+	 * Consider freezing any normal tuples which will not be removed.
+	 * Regardless of whether or not we want to freeze the tuples, if we want
+	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+	 * tuple to know whether or not the page will be totally frozen.
+	 */
 	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
@@ -2190,7 +2286,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phase III, the heap page may be marked
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
  *   all-visible and all-frozen.
  *
  * These changes all happen together, so we use a singel WAL record for them
@@ -2244,6 +2340,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
 	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
+	Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
 	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
 	xlrec.flags = vmflags;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9492423141e..75205179b83 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2015,34 +2015,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2076,8 +2048,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate the status of the page as reflected
-	 * in the visibility map after pruning, freezing, and setting any pages
-	 * all-visible in the visibility map.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page
-	 * (older than OldestXmin).  It will only be valid if we did not set the
-	 * page all-frozen in the VM.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
 	uint8		old_vmbits;
 	uint8		new_vmbits;
 
-- 
2.43.0



  [text/x-patch] v12-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (7.1K, 13-v12-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From 1345b081eff8eabc84ee7026a6be1a4ee5a45f47 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v12 13/20] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()

Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 14 +++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 13 ++++++-------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6637966e927..0211effeec7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -580,9 +580,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1182,11 +1182,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..f67f01c17c2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v12-0012-Remove-xl_heap_visible-entirely.patch (25.1K, 14-v12-0012-Remove-xl_heap_visible-entirely.patch)
  download | inline diff:
From 1980ef906cb104da3a97a597ee69de73a21bdf0e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v12 12/20] Remove xl_heap_visible entirely

There are now no users of this, so eliminate it entirely.
---
 src/backend/access/common/bufmask.c      |   3 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 160 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  10 +-
 src/backend/access/heap/visibilitymap.c  | 109 +--------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  11 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 34 insertions(+), 370 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cff531a4801..6f161a6eab2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
 #include "access/valid.h"
 #include "access/visibilitymap.h"
 #include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
@@ -2526,11 +2527,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(relation,
-									 BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set(relation,
+							  BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN);
 		}
 
 		/*
@@ -8801,49 +8802,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
 
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5872f13397f..9f16ba68d16 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -53,6 +53,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
 	vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
 
 	/*
 	 * After xl_heap_prune is the optional snapshot conflict horizon.
@@ -243,9 +245,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	 * the VM is set.
 	 *
 	 * In recovery, we expect no other writers, so writing to the VM page
-	 * without holding a lock on the heap page is considered safe enough. It
-	 * is done this way when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * without holding a lock on the heap page is considered safe enough.
 	 */
 	if (vmflags & VISIBILITYMAP_VALID_BITS &&
 		XLogReadBufferForRedoExtended(record, 1,
@@ -261,7 +261,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+		old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
 
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
@@ -280,142 +280,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -793,9 +657,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 * the VM is set.
 	 *
 	 * In recovery, we expect no other writers, so writing to the VM page
-	 * without holding a lock on the heap page is considered safe enough. It
-	 * is done this way when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * without holding a lock on the heap page is considered safe enough.
 	 */
 	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -808,15 +670,14 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(reln, blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN);
-
 		/*
 		 * It is not possible that the VM was already set for this heap page,
 		 * so the vmbuffer must have been modified and marked dirty.
 		 */
+		visibilitymap_set(reln, blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		FreeFakeRelcacheEntry(reln);
@@ -1397,9 +1258,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dce9025d268..6637966e927 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -989,8 +989,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_set_vm)
 		{
 			Assert(PageIsAllVisible(page));
-			old_vmbits = visibilitymap_set_vmbits(relation, blockno,
-												  vmbuffer, vmflags);
+			old_vmbits = visibilitymap_set(relation, blockno,
+										   vmbuffer, vmflags);
 
 			if (old_vmbits == vmflags)
 			{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 75205179b83..2dcca071a45 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1888,8 +1888,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			visibilitymap_set_vmbits(vacrel->rel, blkno,
-									 vmbuffer, new_vmbits);
+			visibilitymap_set(vacrel->rel, blkno,
+							  vmbuffer, new_vmbits);
 
 			if (RelationNeedsWAL(vacrel->rel))
 			{
@@ -2757,9 +2757,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		Assert(!PageIsAllVisible(page));
 		set_pd_all_vis = true;
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(vacrel->rel,
-								 blkno,
-								 vmbuffer, vmflags);
+		visibilitymap_set(vacrel->rel,
+						  blkno,
+						  vmbuffer, vmflags);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index aa48a436108..f7bad68ffc5 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set flags in the VM block contained in the passed in vmBuf.
  *
@@ -343,8 +240,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * is pinned and exclusive locked.
  */
 uint8
-visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 439f33b8061..3342af02c75 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -344,13 +344,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -455,9 +448,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d8508593e7c..3672f372aa8 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -446,20 +445,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -503,11 +488,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index fc7056a91ea..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..b4c880c083f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4273,7 +4273,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v12-0015-Inline-TransactionIdFollows-Precedes.patch (5.0K, 15-v12-0015-Inline-TransactionIdFollows-Precedes.patch)
  download | inline diff:
From ba7442b7b550e1510d49c9df7eb23ddaf8533644 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v12 15/20] Inline TransactionIdFollows/Precedes()

Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/transam/transam.c | 64 -------------------------
 src/include/access/transam.h         | 70 ++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
-	/*
-	 * If either ID is a permanent XID then we can just do unsigned
-	 * comparison.  If both are normal, do a modulo-2^32 comparison.
-	 */
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 < id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 <= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 > id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 >= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff >= 0);
-}
-
 
 /*
  * TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
 } TransamVariablesData;
 
 
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+	/*
+	 * If either ID is a permanent XID then we can just do unsigned
+	 * comparison.  If both are normal, do a modulo-2^32 comparison.
+	 */
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 < id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 <= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 > id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 >= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff >= 0);
+}
+
+
 /* ----------------
  *		extern declarations
  * ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
 extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
-- 
2.43.0



  [text/x-patch] v12-0016-Unset-all-visible-sooner-if-not-freezing.patch (2.5K, 16-v12-0016-Unset-all-visible-sooner-if-not-freezing.patch)
  download | inline diff:
From b3f607261a22bad37a3aba9091dbee049d424eda Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v12 16/20] Unset all-visible sooner if not freezing

In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.

Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.

Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
 src/backend/access/heap/pruneheap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c6935e45cec..ba8ddc7fa35 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1503,8 +1503,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1762,8 +1765,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v12-0014-Use-GlobalVisState-to-determine-page-level-visib.patch (10.8K, 17-v12-0014-Use-GlobalVisState-to-determine-page-level-visib.patch)
  download | inline diff:
From e993249ab9d2ec35202cba48cce4a8928dd03ab9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v12 14/20] Use GlobalVisState to determine page level
 visibility

During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.

It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.

Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.

Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
 src/backend/access/heap/pruneheap.c         | 48 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 19 ++++----
 src/include/access/heapam.h                 |  4 +-
 4 files changed, 60 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0211effeec7..c6935e45cec 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon, when setting the VM or when
+	 * freezing all the live tuples on the page.
 	 *
 	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
 	 * directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -559,14 +558,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon.
-	 * This is most likely to happen when updating the VM and/or freezing all
-	 * live tuples on the page. It is updated before returning to the caller
-	 * because vacuum does assert-build only validation on the page using this
-	 * field.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState.
+	 *
+	 * If we encounter an uncommitted tuple, this field is unmaintained. If
+	 * the page is being set all-visible or when freezing all live tuples on
+	 * the page, it is used to calculate the snapshot conflict horizon.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -762,6 +759,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -1108,12 +1115,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(cutoffs);
-
 		Assert(prstate.lpdead_items == 0);
 
 		if (!heap_page_is_all_visible(relation, buffer,
-									  cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1638,19 +1643,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2dcca071a45..4ad05ba4db6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2717,7 +2717,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 InvalidOffsetNumber);
 
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3462,13 +3462,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3487,7 +3487,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * *all_frozen is an output parameter indicating to the caller if every tuple
  * on the page is frozen.
@@ -3508,7 +3508,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3580,8 +3580,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		tuple.t_len = ItemIdGetLength(itemid);
 		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
-										 buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+												  buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3600,8 +3600,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin,
-											   OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
 
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v12-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch (27.1K, 18-v12-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 1bc54d63a946c7999427716a711bb6be9db74861 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v12 17/20] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c              | 15 ++++-
 src/backend/access/heap/heapam_handler.c      | 15 ++++-
 src/backend/access/heap/pruneheap.c           | 67 ++++++++++++++-----
 src/backend/access/index/indexam.c            | 46 +++++++++++++
 src/backend/access/table/tableam.c            | 39 +++++++++--
 src/backend/executor/execMain.c               |  4 ++
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  7 +-
 src/backend/executor/nodeIndexscan.c          | 18 +++--
 src/backend/executor/nodeSeqscan.c            | 24 +++++--
 src/include/access/genam.h                    | 11 +++
 src/include/access/heapam.h                   | 24 ++++++-
 src/include/access/relscan.h                  |  6 ++
 src/include/access/tableam.h                  | 30 ++++++++-
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  3 +-
 16 files changed, 277 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6f161a6eab2..f9e50d47aee 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1247,6 +1250,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1285,6 +1289,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1317,6 +1327,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ba8ddc7fa35..69d8e42bdc8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			int			options = 0;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				options = HEAP_PAGE_PRUNE_UPDATE_VM;
+			}
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * that during on-access pruning with the current implementation.
 			 */
 			heap_page_prune_and_freeze(relation, buffer, false,
-									   InvalidBuffer,
-									   vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+									   vmbuffer ? *vmbuffer : InvalidBuffer,
+									   vistest, options,
+									   NULL, &presult, PRUNE_ON_ACCESS,
+									   &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -519,12 +531,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * all-frozen for use in opportunistic freezing and to update the VM if
 	 * the caller requests it.
 	 *
-	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
-	 * other callers could do either one. The visibility bookkeeping is
-	 * required for opportunistic freezing (in addition to setting the VM
-	 * bits) because we only consider opportunistically freezing tuples if the
-	 * whole page would become all-frozen or if the whole page will be frozen
-	 * except for dead tuples that will be removed by vacuum.
+	 * Currently, only VACUUM attempts freezing. But other callers could. The
+	 * visibility bookkeeping is required for opportunistic freezing (in
+	 * addition to setting the VM bits) because we only consider
+	 * opportunistically freezing tuples if the whole page would become
+	 * all-frozen or if the whole page will be frozen except for dead tuples
+	 * that will be removed by vacuum. But if consider_update_vm is false,
+	 * we'll not set the VM even if the page is discovered to be all-visible.
+	 *
+	 * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+	 * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+	 * because we will not call heap_prepare_freeze_tuple() on each tuple.
 	 *
 	 * If only updating the VM, we must initialize all_frozen to false, as
 	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -536,7 +553,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * whether or not to freeze but before deciding whether or not to update
 	 * the VM so that we don't set the VM bit incorrectly.
 	 *
-	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * If not freezing and not updating the VM, we avoid the extra
 	 * bookkeeping. Initializing all_visible to false allows skipping the work
 	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
@@ -885,12 +902,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.all_frozen = false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate.consider_update_vm &&
+		prstate.all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate.consider_update_vm = false;
+		prstate.all_visible = prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * Handle setting visibility map bit based on information from the VM (if
+	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+	 * call), and from all_visible and all_frozen variables.
 	 */
 	if (prstate.consider_update_vm)
 	{
@@ -2284,8 +2319,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a singel WAL record for them
  * all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff12e2e1364..2e0474c948a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index b2ce35e2a34..e31c21cf8eb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -881,6 +883,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -918,10 +939,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, struct ScanKeyData *key)
+				   int nkeys, struct ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index de782014b2d..839c1be1d7c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -678,6 +678,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v12-0018-Add-helper-functions-to-heap_page_prune_and_free.patch (19.2K, 19-v12-0018-Add-helper-functions-to-heap_page_prune_and_free.patch)
  download | inline diff:
From 6c3258af5d7959a275876c5ce694fe6923be821e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v12 18/20] Add helper functions to heap_page_prune_and_freeze

heap_page_prune_and_freeze() has gotten rather long. It has several
stages:

1) setup - where the PruneState is set up
2) tuple examination -- where tuples and line pointers are examined to
   determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
   heuristics, and state gathered during stage 2 whether or not to
   freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged

This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.

XXX: For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
 src/backend/access/heap/pruneheap.c | 473 +++++++++++++++++-----------
 1 file changed, 296 insertions(+), 177 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 69d8e42bdc8..67b56e45ad7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+								  bool do_prune,
+								  bool do_hint_full_or_prunable,
+								  bool did_tuple_hint_fpi,
+								  PruneState *prstate,
+								  bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+									 Buffer buffer, BlockNumber blockno, Page page,
+									 PruneReason reason,
+									 bool do_prune, bool do_freeze,
+									 bool blk_known_av,
+									 PruneState *prstate,
+									 Buffer *vmbuffer, uint8 *vmflags,
+									 bool *set_pd_all_visible);
+
 static bool identify_and_fix_vm_corruption(Relation relation,
 										   BlockNumber heap_blk,
 										   Buffer heap_buffer, Page heap_page,
@@ -382,6 +398,249 @@ identify_and_fix_vm_corruption(Relation relation,
 	return false;
 }
 
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+						 Buffer buffer, BlockNumber blockno, Page page,
+						 PruneReason reason,
+						 bool do_prune, bool do_freeze,
+						 bool blk_known_av,
+						 PruneState *prstate,
+						 Buffer *vmbuffer, uint8 *vmflags,
+						 bool *set_pd_all_visible)
+{
+	bool		do_set_vm = false;
+
+	/*
+	 * If the caller specified not to update the VM, validate everything is in
+	 * the right state and exit.
+	 */
+	if (!prstate->consider_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		/* We don't set only the page level visibility hint */
+		Assert(!(*set_pd_all_visible));
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->consider_update_vm &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate->consider_update_vm = false;
+		prstate->all_visible = prstate->all_frozen = false;
+	}
+
+	Assert(!prstate->all_frozen || prstate->all_visible);
+
+	/*
+	 * Clear any VM corruption. This does not need to be in a critical
+	 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+	 * may mark the heap page buffer dirty here and could end up doing so
+	 * again later. This is not a correctness issue and is in the path of VM
+	 * corruption, so we don't have to worry about the extra performance
+	 * overhead.
+	 */
+	if (identify_and_fix_vm_corruption(relation,
+									   blockno, buffer, page,
+									   blk_known_av, prstate->lpdead_items,
+									   *vmbuffer))
+	{
+		/* If we fix corruption, don't update the VM further */
+	}
+
+	/* Determine if we actually need to set the VM and which bits to set. */
+	else if (prstate->all_visible &&
+			 (!blk_known_av ||
+			  (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+	{
+		*vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+	}
+
+	do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	*set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+	return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool do_prune,
+					  bool do_hint_full_or_prunable,
+					  bool did_tuple_hint_fpi,
+					  PruneState *prstate,
+					  bool *all_frozen_except_lp_dead)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and exit.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		Assert(!(*all_frozen_except_lp_dead));
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_full_or_prunable)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	*all_frozen_except_lp_dead = prstate->all_frozen;
+	if (prstate->lpdead_items > 0)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
+
+	return do_freeze;
+}
+
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page. If the page's visibility status has changed, update it in
@@ -772,20 +1031,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Clear the offset information once we have processed the given page. */
 	*off_loc = InvalidOffsetNumber;
 
-	do_prune = prstate.nredirected > 0 ||
-		prstate.ndead > 0 ||
-		prstate.nunused > 0;
-
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
-	 * all-visible.
+	 * all-visible. This must be done before we decide whether or not to
+	 * opportunistically freeze below because we do not want to
+	 * opportunistically freeze the page if there are live tuples not visible
+	 * to everyone, which would prevent setting the page frozen in the VM.
 	 */
 	if (prstate.all_visible &&
 		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
+	/*
+	 * Now decide based on information collected while examining every tuple
+	 * which actions to take. If there are any prunable tuples, we'll prune
+	 * them. However, we will decide based on options specified by the caller
+	 * and various heuristics whether or not to freeze any tuples and whether
+	 * or not the page should be set all-visible/all-frozen in the VM.
+	 */
+	do_prune = prstate.nredirected > 0 ||
+		prstate.ndead > 0 ||
+		prstate.nunused > 0;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -796,186 +1065,36 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageIsFull(page);
 
 	/*
-	 * Decide if we want to go ahead with freezing according to the freeze
-	 * plans we prepared, or not.
-	 */
-	do_freeze = false;
-	if (prstate.attempt_freeze)
-	{
-		if (prstate.pagefrz.freeze_required)
-		{
-			/*
-			 * heap_prepare_freeze_tuple indicated that at least one XID/MXID
-			 * from before FreezeLimit/MultiXactCutoff is present.  Must
-			 * freeze to advance relfrozenxid/relminmxid.
-			 */
-			do_freeze = true;
-		}
-		else
-		{
-			/*
-			 * Opportunistically freeze the page if we are generating an FPI
-			 * anyway and if doing so means that we can set the page
-			 * all-frozen afterwards (might not happen until VACUUM's final
-			 * heap pass).
-			 *
-			 * XXX: Previously, we knew if pruning emitted an FPI by checking
-			 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze
-			 * and prune records were combined, this heuristic couldn't be
-			 * used anymore.  The opportunistic freeze heuristic must be
-			 * improved; however, for now, try to approximate the old logic.
-			 */
-			if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
-			{
-				/*
-				 * Freezing would make the page all-frozen.  Have already
-				 * emitted an FPI or will do so anyway?
-				 */
-				if (RelationNeedsWAL(relation))
-				{
-					if (did_tuple_hint_fpi)
-						do_freeze = true;
-					else if (do_prune)
-					{
-						if (XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-					else if (do_hint_full_or_prunable)
-					{
-						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
-	}
-	else if (prstate.nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate.pagefrz.freeze_required);
-
-		prstate.all_frozen = false;
-		prstate.nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page. It needs to reflect the present state of
-	 * the page when using it to determine whether or not to update the VM.
-	 *
-	 * Keep track of whether or not the page was all-frozen except LP_DEAD
-	 * items for the purposes of calculating the snapshot conflict horizon,
-	 * though.
+	 * We must decide whether or not to freeze before deciding if and what to
+	 * set in the VM.
 	 */
-	all_frozen_except_lp_dead = prstate.all_frozen;
-	if (prstate.lpdead_items > 0)
-	{
-		prstate.all_visible = false;
-		prstate.all_frozen = false;
-	}
+	do_freeze = heap_page_will_freeze(relation, buffer,
+									  do_prune,
+									  do_hint_full_or_prunable,
+									  did_tuple_hint_fpi,
+									  &prstate,
+									  &all_frozen_except_lp_dead);
+
+	do_set_vm = heap_page_will_update_vm(relation,
+										 buffer, blockno, page,
+										 reason,
+										 do_prune, do_freeze,
+										 blk_known_av,
+										 &prstate,
+										 &vmbuffer,
+										 &vmflags, &set_pd_all_visible);
 
-	/*
-	 * If this is an on-access call and we're not actually pruning, avoid
-	 * setting the visibility map if it would newly dirty the heap page or, if
-	 * the page is already dirty, if doing so would require including a
-	 * full-page image (FPI) of the heap page in the WAL. This situation
-	 * should be rare, as on-access pruning is only attempted when
-	 * pd_prune_xid is valid.
-	 */
-	if (reason == PRUNE_ON_ACCESS &&
-		prstate.consider_update_vm &&
-		prstate.all_visible &&
-		!do_prune && !do_freeze &&
-		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
-	{
-		prstate.consider_update_vm = false;
-		prstate.all_visible = prstate.all_frozen = false;
-	}
-
-	Assert(!prstate.all_frozen || prstate.all_visible);
-
-	/*
-	 * Handle setting visibility map bit based on information from the VM (if
-	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
-	 * call), and from all_visible and all_frozen variables.
-	 */
-	if (prstate.consider_update_vm)
-	{
-		/*
-		 * Clear any VM corruption. This does not need to be in a critical
-		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
-		 * we may mark the heap page buffer dirty here and could end up doing
-		 * so again later. This is not a correctness issue and is in the path
-		 * of VM corruption, so we don't have to worry about the extra
-		 * performance overhead.
-		 */
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av, prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
-
-		/* Determine if we actually need to set the VM and which bits to set. */
-		else if (prstate.all_visible &&
-				 (!blk_known_av ||
-				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
-			if (prstate.all_frozen)
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-	}
-
-	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+	/* Save these for the caller in case we later zero out vmflags */
+	presult->new_vmbits = vmflags;
 
-	/* Lock vmbuffer before entering a critical section */
+	/* Lock vmbuffer before entering critical section */
 	if (do_set_vm)
 		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/*
-	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
-	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
-	 * set, we strongly prefer to keep them in sync.
-	 *
-	 * Prior to Postgres 19, it was possible for the page-level bit to be set
-	 * and the VM bit to be clear. This could happen if we crashed after
-	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 * Time to actually make the changes to the page and log them. Any error
+	 * while applying the changes is critical.
 	 */
-	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
-
-	/* Save these for the caller in case we later zero out vmflags */
-	presult->new_vmbits = vmflags;
-
-	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
 	if (do_hint_full_or_prunable)
-- 
2.43.0



  [text/x-patch] v12-0019-Reorder-heap_page_prune_and_freeze-parameters.patch (5.8K, 20-v12-0019-Reorder-heap_page_prune_and_freeze-parameters.patch)
  download | inline diff:
From f9780190dda309979ed52a820e623cecdbac3ad8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v12 19/20] Reorder heap_page_prune_and_freeze parameters

Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
 src/backend/access/heap/pruneheap.c  | 38 ++++++++++++++--------------
 src/backend/access/heap/vacuumlazy.c |  6 ++---
 src/include/access/heapam.h          |  4 +--
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 67b56e45ad7..3e55c43f17b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, false,
+			heap_page_prune_and_freeze(relation, buffer, options, false,
 									   vmbuffer ? *vmbuffer : InvalidBuffer,
-									   vistest, options,
-									   NULL, &presult, PRUNE_ON_ACCESS,
+									   vistest,
+									   NULL, PRUNE_ON_ACCESS, &presult,
 									   &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -651,6 +651,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
+ * options:
+ *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ *   pruning.
+ *
+ *   FREEZE indicates that we will also freeze tuples, and will return
+ *   'all_visible', 'all_frozen' flags to the caller.
+ *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
@@ -669,30 +678,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * contain the required block of the visibility map.
  *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- *   UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
  *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed.  It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -705,13 +705,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   int options,
 						   bool blk_known_av,
 						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
-						   int options,
 						   struct VacuumCutoffs *cutoffs,
-						   PruneFreezeResult *presult,
 						   PruneReason reason,
+						   PruneFreezeResult *presult,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4ad05ba4db6..4fb915e1d94 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1993,11 +1993,11 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf,
+	heap_page_prune_and_freeze(rel, buf, prune_options,
 							   all_visible_according_to_vm,
 							   vmbuffer,
-							   vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+							   vacrel->vistest,
+							   &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
 								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   int options,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
-									   int options,
 									   struct VacuumCutoffs *cutoffs,
-									   PruneFreezeResult *presult,
 									   PruneReason reason,
+									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
-- 
2.43.0



  [text/x-patch] v12-0020-Set-pd_prune_xid-on-insert.patch (6.5K, 21-v12-0020-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 95c4ebccc8f78b106f43f709a7a657c5102cd2a7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v12 20/20] Set pd_prune_xid on insert

Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.

For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.

Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.

ci-os-only:
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f9e50d47aee..09d97896c66 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2164,15 +2165,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2182,7 +2187,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2548,8 +2552,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 9f16ba68d16..321d6a0d960 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -479,6 +479,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -628,9 +634,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-09 19:26  Robert Haas <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Robert Haas @ 2025-09-09 19:26 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Tue, Sep 9, 2025 at 12:24 PM Melanie Plageman
<[email protected]> wrote:
> For heap_xlog_visible() the LSN interlock comment is easier to parse
> because of an earlier comment before reading the heap page:
>
>     /*
>      * Read the heap page, if it still exists. If the heap file has dropped or
>      * truncated later in recovery, we don't need to update the page, but we'd
>      * better still update the visibility map.
>      */
>
> I've gone with the direct copy-paste of the LSN interlock paragraph in
> attached v12. I think referring to the other comment is too confusing
> in context here. However, I also added a line about what could cause
> the LSN interlock -- but above it, so as to retain grep-ability of the
> other comment.

I think that reads a little strangely. I would consolidate: Note that
the heap relation may have been dropped or truncated, leading us to
skip updating the heap block due to the LSN interlock. However, even
in that case, it's still safe to update the visibility map, etc. The
rest of the comment is perhaps a tad more explicit than our usual
practice, but that might be a good thing, because sometimes we're a
little too terse about these critical details.

I just realized that I don't like this:

+ /*
+ * If we're only adding already frozen rows to a previously empty
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
+ */

The thing is, we rarely position a block comment just before an "else
if". There are probably instances, but it's not typical. That's why
the existing comment contains two "if blah then blah" statements of
which you deleted the second -- because it needed to cover both the
"if" and the "else if". An alternative style is to move the comment
down a nesting level and rephrase without the conditional, ie. "We're
only adding frozen rows to a previously empty page, so mark it as
all-frozen etc." But I don't know that I like doing that for one
branch of the "if" and not the other.

The rest of what's now 0001 looks OK to me now, although you might
want to wait for a review from somebody more knowledgeable about this
area.

Some very quick comments on the next few patches -- far from a full review:

0002. Looks boring, probably unobjectionable provided the payoff patch is OK.

0003. What you've done here with xl_heap_prune.flags is kind of
horrifying. The problem is that, while you've added code explaining
that VISIBILITYMAP_ALL_{VISIBLE,FROZEN} are honorary XLHP flags,
nobody who isn't looking directly at that comment is going to
understand the muddling of the two namespaces. I would suggest not
doing this, even if it means defining redundant constants and writing
technically-unnecessary code to translate between them.

0004. It is not clear to me why you need to get
log_heap_prune_and_freeze to do the work here. Why can't
log_newpage_buffer get the job done already?

0005. It looks a little curious that you delete the
identify-corruption logic from the end of the if-nest and add it to
the beginning. Ceteris paribus, you'd expect that to be worse, since
corruption is a rare case.

0006. "to me marked" -> "to be marked".

+                * If the heap page is all-visible but the VM bit is
not set, we don't
+                * need to dirty the heap page.  However, if checksums
are enabled, we
+                * do need to make sure that the heap page is dirtied
before passing
+                * it to visibilitymap_set(), because it may be logged.
                 */
-               PageSetAllVisible(page);
-               MarkBufferDirty(buf);
+               if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+               {
+                       PageSetAllVisible(page);
+                       MarkBufferDirty(buf);
+               }

I really hate this. Maybe you're going to argue that it's not the job
of this patch to fix the awfulness here, but surely marking a buffer
dirty in case some other function decides to WAL-log it is a
ridiculous plan.

-- 
Robert Haas
EDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-09 23:07  Melanie Plageman <[email protected]>
  parent: Robert Haas <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-09-09 23:07 UTC (permalink / raw)
  To: Robert Haas <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Thanks for the review! I've made the changes to comments and minor
fixes you suggested in attached v13 and have limited my inline
responses to areas where further discussion is required.

On Tue, Sep 9, 2025 at 3:26 PM Robert Haas <[email protected]> wrote:
>
> 0003. What you've done here with xl_heap_prune.flags is kind of
> horrifying. The problem is that, while you've added code explaining
> that VISIBILITYMAP_ALL_{VISIBLE,FROZEN} are honorary XLHP flags,
> nobody who isn't looking directly at that comment is going to
> understand the muddling of the two namespaces. I would suggest not
> doing this, even if it means defining redundant constants and writing
> technically-unnecessary code to translate between them.

Fair. I've introduced new XLHP flags in attached v13. Hopefully it
puts an end to the horror.

> 0004. It is not clear to me why you need to get
> log_heap_prune_and_freeze to do the work here. Why can't
> log_newpage_buffer get the job done already?

Well, I need something to emit the changes to the VM. I'm eliminating
all users of xl_heap_visible. Empty pages are the ones that benefit
the least from switching from xl_heap_visible -> xl_heap_prune. But,
if I don't transition them, we have to maintain all the
xl_heap_visible code (including visibilitymap_set() in its long form).

As for log_newpage_buffer(), I could keep it if you think it is too
confusing to change log_heap_prune_and_freeze()'s API (by passing
force_heap_fpi) to handle this case, I can leave log_newpage_buffer()
there and then call log_heap_prune_and_freeze().

I just thought it seemed simple to avoid emitting the new page record
and the VM update record, so why not -- but I don't have strong
feelings.

> 0005. It looks a little curious that you delete the
> identify-corruption logic from the end of the if-nest and add it to
> the beginning. Ceteris paribus, you'd expect that to be worse, since
> corruption is a rare case.

On master, the two corruption cases are sandwiched between the normal
VM set cases. And I actually think doing it this way is brittle. If
you put the cases which set the VM first, you have to have completely
bulletproof the if statements guarding them to foreclose any possible
corruption case from entering because otherwise you will overwrite the
corruption you then try to detect.

But, specifically, from a performance perspective:

I think moving up the third case doesn't matter because the check is so cheap:

    else if (presult.lpdead_items > 0 && PageIsAllVisible(page))

And as for moving up the second case (the other corruption case), the
non-cheap thing it does is call visibilitymap_get_status()

    else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
             visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)

But once you call visibilitymap_get_status() once, assuming there is
no corruption and you need to go set the VM, you've already got that
page of the VM read, so it is probably pretty cheap. Overall, I didn't
think this would add noticeable overhead or many wasted operations.

And I thought that reorganizing the code improved clarity as well as
decreased the likelihood of bugs from insufficiently guarding positive
cases against corrupt pages and overwriting corruption instead of
detecting it.

If we're really worried about it from a performance perspective, I
could add an extra test at the top of identify_and_fix_vm_corruption()
that dumps out early if (!all_visible_according_to_vm &&
presult.all_visible).

> +                * If the heap page is all-visible but the VM bit is
> not set, we don't
> +                * need to dirty the heap page.  However, if checksums
> are enabled, we
> +                * do need to make sure that the heap page is dirtied
> before passing
> +                * it to visibilitymap_set(), because it may be logged.
>                  */
> -               PageSetAllVisible(page);
> -               MarkBufferDirty(buf);
> +               if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
> +               {
> +                       PageSetAllVisible(page);
> +                       MarkBufferDirty(buf);
> +               }
>
> I really hate this. Maybe you're going to argue that it's not the job
> of this patch to fix the awfulness here, but surely marking a buffer
> dirty in case some other function decides to WAL-log it is a
> ridiculous plan.

Right, it isn't pretty. But I don't quite see what the alternative is.
We need to mark the buffer dirty before setting the LSN. We could
perhaps rewrite visibilitymap_set()'s API to return the LSN of the
xl_heap_visible record and stamp it on the heap buffer ourselves. But
1) I think visibilitymap_set() purposefully conceals its WAL logging
ways from the caller and propagating that info back up starts to make
the API messy in another way and 2) I'm a bit loath to make big
changes to visibilitymap_set() right now since my patch set eventually
resolves this by putting the changes to the VM and heap page in the
same WAL record.

- Melanie


Attachments:

  [text/x-patch] v13-0004-Use-xl_heap_prune-record-for-setting-empty-pages.patch (5.9K, 2-v13-0004-Use-xl_heap_prune-record-for-setting-empty-pages.patch)
  download | inline diff:
From 6d9a4502319e125d4fa5350cf63019427afba066 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v13 04/20] Use xl_heap_prune record for setting empty pages
 all-visible

As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
 src/backend/access/heap/pruneheap.c  | 14 +++++--
 src/backend/access/heap/vacuumlazy.c | 57 ++++++++++++++++++----------
 src/include/access/heapam.h          |  1 +
 3 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 680c0562322..343ab55e527 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  false,
 									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
@@ -2055,6 +2056,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
  * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
  * the page LSN when checksums/wal_log_hints are enabled even if we did not
  * prune or freeze tuples on the page.
@@ -2065,6 +2069,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  bool force_heap_fpi,
 						  Buffer vmbuffer,
 						  uint8 vmflags,
 						  bool set_pd_all_vis,
@@ -2096,13 +2101,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 
 	regbuf_flags = REGBUF_STANDARD;
 
+	if (force_heap_fpi)
+		regbuf_flags |= REGBUF_FORCE_IMAGE;
+
 	/*
 	 * We can avoid an FPI if the only modification we are making to the heap
 	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+	else if (!do_prune &&
+			 nfrozen == 0 &&
+			 (!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 51067264004..a1cdaaebb57 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1877,33 +1877,49 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+				VISIBILITYMAP_ALL_FROZEN;
+
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
-			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				PageGetLSN(page) == InvalidXLogRecPtr)
-				log_newpage_buffer(buf, true);
+			visibilitymap_set_vmbits(vacrel->rel, blkno,
+									 vmbuffer, new_vmbits);
+
+			if (RelationNeedsWAL(vacrel->rel))
+			{
+				/*
+				 * It's possible that another backend has extended the heap,
+				 * initialized the page, and then failed to WAL-log the page
+				 * due to an ERROR.  Since heap extension is not WAL-logged,
+				 * recovery might try to replay our record setting the page
+				 * all-visible and find that the page isn't initialized, which
+				 * will cause a PANIC. To prevent that, if the page hasn't
+				 * been previously WAL-logged, force a heap FPI.
+				 */
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  PageGetLSN(page) == InvalidXLogRecPtr,
+										  vmbuffer,
+										  new_vmbits,
+										  true,
+										  InvalidTransactionId,
+										  false, PRUNE_VACUUM_SCAN,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+			}
 
-			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
-			/* Count the newly all-frozen pages for logging */
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+			/* Count the newly all-frozen pages for logging. */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
 		}
@@ -2925,6 +2941,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
+								  false,
 								  vmbuffer,
 								  vmflags,
 								  set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 13934cb7dc6..7ec270feed0 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  bool force_heap_fpi,
 									  Buffer vmbuffer,
 									  uint8 vmflags,
 									  bool set_pd_all_vis,
-- 
2.43.0



  [text/x-patch] v13-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch (12.1K, 3-v13-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch)
  download | inline diff:
From 58f0e628e901766697bcbbfbaeb7abe685f23d54 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v13 01/20] Eliminate xl_heap_visible in COPY FREEZE

Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.

This halves the number of WAL records emitted by COPY FREEZE.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c        | 44 ++++++++++------
 src/backend/access/heap/heapam_xlog.c   | 54 +++++++++++++++++++-
 src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c  |  5 ++
 src/include/access/visibilitymap.h      |  2 +
 5 files changed, 154 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4c5ae205a7a..c8cd9d22726 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
 
 		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+		{
 			all_frozen_set = true;
+			/* Lock the vmbuffer before entering the critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		}
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
@@ -2506,7 +2510,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		 * going to add further frozen rows to it.
 		 *
 		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
+		 * page, mark it as all-frozen and update the visibility map. We're
+		 * already holding a pin on the vmbuffer.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2517,7 +2522,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
 		else if (all_frozen_set)
+		{
 			PageSetAllVisible(page);
+			visibilitymap_set_vmbits(relation,
+									 BufferGetBlockNumber(buffer),
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN);
+		}
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2637,26 +2658,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+			{
+				Assert(BufferIsDirty(vmbuffer));
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
+			}
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
-		 */
 		if (all_frozen_set)
-		{
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..faa7c561a8a 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -662,6 +663,57 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Now read and update the VM block.
+	 *
+	 * Note that the heap relation may have been dropped or truncated, leading
+	 * us to skip updating the heap block due to the LSN interlock. However,
+	 * even in that case, it's still safe to update the visibility map. Any
+	 * WAL record that clears the visibility map bit does so before checking
+	 * the page LSN, so any bits that need to be cleared will still be
+	 * cleared.
+	 *
+	 * Note that the lock on the heap page was dropped above. In normal
+	 * operation this would never be safe because a concurrent query could
+	 * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+	 * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+	 * the VM is set.
+	 *
+	 * In recovery, we expect no other writers, so writing to the VM page
+	 * without holding a lock on the heap page is considered safe enough. It
+	 * is done this way when replaying xl_heap_visible records (see
+	 * heap_xlog_visible()).
+	 */
+	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		visibilitymap_set_vmbits(reln, blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN);
+
+		/*
+		 * It is not possible that the VM was already set for this heap page,
+		 * so the vmbuffer must have been modified and marked dirty.
+		 */
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..aa48a436108 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
+ *		visibilitymap_set_vmbits - set bit(s) in a pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+		 flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+	/* Call in same critical section where WAL is emitted. */
+	Assert(InRecovery || CritSectionCount > 0);
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	Assert(BufferIsExclusiveLocked(vmBuf));
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..fc7056a91ea 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v13-0002-Make-heap_page_is_all_visible-independent-of-LVR.patch (5.5K, 4-v13-0002-Make-heap_page_is_all_visible-independent-of-LVR.patch)
  download | inline diff:
From a0d48ef7ba1b28c084a6a5bde4e27c6af6fb9820 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v13 02/20] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 48 ++++++++++++++++++----------
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 981d9380a92..8a84bdfe0a9 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,8 +463,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2009,8 +2012,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2912,8 +2916,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * emitted.
 	 */
 	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
@@ -3596,10 +3600,18 @@ dead_items_cleanup(LVRelState *vacrel)
 }
 
 /*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * Check if every tuple in the given page in buf is visible to all current and
+ * future transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * Sets *all_frozen to true if every tuple on this page is frozen.
+ *
+ * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
+ * It is only valid if the page is all-visible.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
  *
  * This is a stripped down version of lazy_scan_prune().  If you change
  * anything here, make sure that everything stays in sync.  Note that an
@@ -3607,9 +3619,11 @@ dead_items_cleanup(LVRelState *vacrel)
  * introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3632,7 +3646,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3656,9 +3670,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
 										 buf))
 		{
 			case HEAPTUPLE_LIVE:
@@ -3679,7 +3693,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+											   OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3714,7 +3728,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
-- 
2.43.0



  [text/x-patch] v13-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch (29.6K, 5-v13-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch)
  download | inline diff:
From c0ae74215319dcd3e79ccd3f15091cf40cf5a692 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v13 03/20] Eliminate xl_heap_visible from vacuum phase III

Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.

The visibilitymap bits are stored in the flags member of the
xl_heap_prune record.

This can decrease the number of of WAL records vacuum phase III emits by
as much as half.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c  | 160 ++++++++++++++++++++----
 src/backend/access/heap/pruneheap.c    |  71 ++++++++++-
 src/backend/access/heap/vacuumlazy.c   | 166 +++++++++++++++++--------
 src/backend/access/rmgrdesc/heapdesc.c |  11 +-
 src/include/access/heapam.h            |   9 ++
 src/include/access/heapam_xlog.h       |  40 ++++--
 6 files changed, 362 insertions(+), 95 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index faa7c561a8a..83b39a9102c 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
+	Size		freespace = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,22 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
+	if (xlrec.flags & XLHP_VM_ALL_VISIBLE)
+	{
+		vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (xlrec.flags & XLHP_VM_ALL_FROZEN)
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+	}
+
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. That means we
+	 * can't replay this record if it removes tuples that are still visible to
+	 * transactions on the standby, freeze tuples with xids that are still
+	 * considered running on the standby, or set a page as all-visible in the
+	 * VM if it isn't all-visible to all transactions on the standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	}
 
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -89,6 +102,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		Size		datalen;
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
+		bool		do_prune;
+		bool		mark_buffer_dirty;
+		bool		set_heap_lsn;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +113,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+		set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 ||
+			   vmflags & VISIBILITYMAP_VALID_BITS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,36 +161,127 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		/*
+		 * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+		 * also going to set bits in the VM later.
+		 *
+		 * We must never end up with the VM bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+		 * modification would fail to clear the VM bit.
+		 */
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+
+			/*
+			 * If the only change to the heap page is setting PD_ALL_VISIBLE,
+			 * we can avoid setting the page LSN unless checksums or
+			 * wal_log_hints are enabled.
+			 */
+			set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+			mark_buffer_dirty = true;
+		}
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
 
-		PageSetLSN(page, lsn);
-		MarkBufferDirty(buffer);
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+		if (set_heap_lsn)
+			PageSetLSN(page, lsn);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or will be setting a page in
+	 * the visibility map, measure the page's freespace to later update the
+	 * freespace map.
+	 *
+	 * Even if we are just updating the VM (and thus not freeing up any
+	 * space), we'll still update the FSM for this page. Since the FSM is not
+	 * WAL-logged and only updated heuristically, it easily becomes stale in
+	 * standbys.  If the standby is later promoted and runs VACUUM, it will
+	 * skip updating individual free space figures for pages that became
+	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
+	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+	 * space values to upper FSM layers; later inserters try to use such pages
+	 * only to find out that they are unusable.  This can cause long stalls
+	 * when there are many such pages.
+	 *
+	 * Forestall those problems by updating FSM's idea about a page that is
+	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
+	 *
+	 * We want to avoid holding an exclusive lock on the heap buffer while
+	 * doing IO (either of the FSM or the VM), so we'll release the lock on
+	 * the heap buffer before doing either.
 	 */
 	if (BufferIsValid(buffer))
 	{
-		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
-						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
-		{
-			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+		if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+							XLHP_HAS_DEAD_ITEMS |
+							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+			vmflags & VISIBILITYMAP_VALID_BITS)
+			freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
 
-			UnlockReleaseBuffer(buffer);
+		UnlockReleaseBuffer(buffer);
+	}
 
-			XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+	/*
+	 * Now read and update the VM block.
+	 *
+	 * Note that the heap relation may have been dropped or truncated, leading
+	 * us to skip updating the heap block due to the LSN interlock. However,
+	 * even in that case, it's still safe to update the visibility map. Any
+	 * WAL record that clears the visibility map bit does so before checking
+	 * the page LSN, so any bits that need to be cleared will still be
+	 * cleared.
+	 *
+	 * Note that the lock on the heap page was dropped above. In normal
+	 * operation this would never be safe because a concurrent query could
+	 * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+	 * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+	 * the VM is set.
+	 *
+	 * In recovery, we expect no other writers, so writing to the VM page
+	 * without holding a lock on the heap page is considered safe enough. It
+	 * is done this way when replaying xl_heap_visible records (see
+	 * heap_xlog_visible()).
+	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		uint8		old_vmbits = 0;
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+		{
+			Assert(BufferIsDirty(vmbuffer));
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		}
-		else
-			UnlockReleaseBuffer(buffer);
+
+		FreeFakeRelcacheEntry(reln);
 	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
+	if (freespace > 0)
+		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7ebd22f00a3..680c0562322 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  InvalidBuffer, 0, false,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -2030,14 +2032,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phase III, the heap page may be marked
+ *   all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2045,12 +2051,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer,
+						  uint8 vmflags,
+						  bool set_pd_all_vis,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
 						  PruneReason reason,
@@ -2062,6 +2079,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,16 +2088,34 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
 	xlrec.flags = 0;
 
+	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+
+	regbuf_flags = REGBUF_STANDARD;
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
+
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
 	 * buffer, but we pretend that they are.  When XLogInsert stores a full
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2136,6 +2172,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Prepare the main xl_heap_prune record.  We already set the XLHP_HAS_*
 	 * flag above.
 	 */
+	if (vmflags & VISIBILITYMAP_ALL_VISIBLE)
+	{
+		xlrec.flags |= XLHP_VM_ALL_VISIBLE;
+		if (vmflags & VISIBILITYMAP_ALL_FROZEN)
+			xlrec.flags |= XLHP_VM_ALL_FROZEN;
+	}
 	if (RelationIsAccessibleInLogicalDecoding(relation))
 		xlrec.flags |= XLHP_IS_CATALOG_REL;
 	if (TransactionIdIsValid(conflict_xid))
@@ -2168,5 +2210,22 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+	{
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+	}
+
+	/*
+	 * If pruning or freezing tuples or setting the page all-visible when
+	 * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+	 * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+	 * but this is deemed okay for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+	{
+		Assert(BufferIsDirty(buffer));
+		PageSetLSN(BufferGetPage(buffer), recptr);
+	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a84bdfe0a9..51067264004 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2852,8 +2854,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
+	bool		set_pd_all_vis = false;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2864,6 +2869,23 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+									   vacrel->cutoffs.OldestXmin,
+									   deadoffsets, num_offsets,
+									   &all_frozen, &visibility_cutoff_xid,
+									   &vacrel->offnum))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+
+		/* Take the lock on the vmbuffer before entering a critical section */
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2883,6 +2905,17 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
+		Assert(!PageIsAllVisible(page));
+		set_pd_all_vis = true;
+		PageSetAllVisible(page);
+		visibilitymap_set_vmbits(vacrel->rel,
+								 blkno,
+								 vmbuffer, vmflags);
+		conflict_xid = visibility_cutoff_xid;
+	}
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2892,7 +2925,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  vmbuffer,
+								  vmflags,
+								  set_pd_all_vis,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
@@ -2901,39 +2937,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
-	 */
 	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
-	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
-								 &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  flags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
@@ -3600,40 +3609,85 @@ dead_items_cleanup(LVRelState *vacrel)
 }
 
 /*
- * Check if every tuple in the given page in buf is visible to all current and
- * future transactions.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+
+/*
+ * Determines whether or not the heap page in buf is all-visible other than
+ * the dead line pointers referred to by the provided deadoffsets array.
  *
- * OldestXmin is used to determine visibility.
+ * deadoffsets are the offsets the caller knows about and already removed
+ * associated index entries. Vacuum will call this before setting those line
+ * pointers LP_UNUSED. So, if there are no new LP_DEAD items, then the page
+ * can be set all-visible in the VM by the caller.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
  *
- * Sets *all_frozen to true if every tuple on this page is frozen.
+ * OldestXmin is used to determine visibility.
  *
- * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
- * It is only valid if the page is all-visible.
+ * *all_frozen is an output parameter indicating to the caller if every tuple
+ * on the page is frozen.
  *
  * *logging_offnum will have the OffsetNumber of the current tuple being
  * processed for vacuum's error callback system.
  *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
+ * visible tuples. It is only valid if the page is all-visible.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
  */
 static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+							   TransactionId OldestXmin,
+							   OffsetNumber *deadoffsets,
+							   int ndeadoffsets,
+							   bool *all_frozen,
+							   TransactionId *visibility_cutoff_xid,
+							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	int			matched_dead_count = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
 
+	Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+	/* Confirm input deadoffsets[] is strictly sorted */
+	if (ndeadoffsets > 1)
+	{
+		for (int i = 1; i < ndeadoffsets; i++)
+			Assert(deadoffsets[i - 1] < deadoffsets[i]);
+	}
+#endif
+
 	maxoff = PageGetMaxOffsetNumber(page);
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff && all_visible;
@@ -3661,9 +3715,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			if (!deadoffsets ||
+				matched_dead_count >= ndeadoffsets ||
+				deadoffsets[matched_dead_count] != offnum)
+			{
+				*all_frozen = all_visible = false;
+				break;
+			}
+			matched_dead_count++;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..1cb44ca32d3 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
  * code, the latter of which is used in frontend (pg_waldump) code.
  */
 void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 									   int *nplans, xlhp_freeze_plan **plans,
 									   OffsetNumber **frz_offsets,
 									   int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & XLHP_VM_ALL_VISIBLE)
+		{
+			uint8		vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			if (xlrec->flags & XLHP_VM_ALL_FROZEN)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+		}
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..13934cb7dc6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
 										   Buffer buffer);
 extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  const struct VacuumCutoffs *cutoffs,
 									  HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer,
+									  uint8 vmflags,
+									  bool set_pd_all_vis,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
 									  PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..6d759c197a1 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
  * Main data section:
  *
  *	xl_heap_prune
- *		uint8				flags
+ *		uint16				flags
  *	TransactionId			snapshot_conflict_horizon
  *
  * Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
  */
 typedef struct xl_heap_prune
 {
-	uint8		flags;
+	uint16		flags;
 
 	/*
 	 * If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,10 +292,26 @@ typedef struct xl_heap_prune
 	 */
 } xl_heap_prune;
 
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
+
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set. Note
+ * that VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN are defined to
+ * the same values as XLHP_VM_ALL_VISIBLE and XLHP_VM_ALL_FROZEN respectively.
+ * However, xl_heap_prune should always use the XLHP flags and translate them
+ * back to their visibilitymapdefs.h equivalent.
+ */
+#define		XLHP_VM_ALL_VISIBLE			(1 << 0)
+#define		XLHP_VM_ALL_FROZEN			(1 << 1)
 
-/* to handle recovery conflict during logical decoding on standby */
-#define		XLHP_IS_CATALOG_REL			(1 << 1)
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
+#define		XLHP_IS_CATALOG_REL			(1 << 2)
 
 /*
  * Does replaying the record require a cleanup-lock?
@@ -305,7 +321,7 @@ typedef struct xl_heap_prune
  * marks LP_DEAD line pointers as unused without moving any tuple data, an
  * ordinary exclusive lock is sufficient.
  */
-#define		XLHP_CLEANUP_LOCK	       (1 << 2)
+#define		XLHP_CLEANUP_LOCK	       (1 << 3)
 
 /*
  * If we remove or freeze any entries that contain xids, we need to include a
@@ -313,22 +329,22 @@ typedef struct xl_heap_prune
  * there are no queries running for which the removed tuples are still
  * visible, or which still consider the frozen XIDs as running.
  */
-#define		XLHP_HAS_CONFLICT_HORIZON   (1 << 3)
+#define		XLHP_HAS_CONFLICT_HORIZON   (1 << 4)
 
 /*
  * Indicates that an xlhp_freeze_plans sub-record and one or more
  * xlhp_freeze_plan sub-records are present.
  */
-#define		XLHP_HAS_FREEZE_PLANS		(1 << 4)
+#define		XLHP_HAS_FREEZE_PLANS		(1 << 5)
 
 /*
  * XLHP_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, and XLHP_HAS_NOW_UNUSED_ITEMS
  * indicate that xlhp_prune_items sub-records with redirected, dead, and
  * unused item offsets are present.
  */
-#define		XLHP_HAS_REDIRECTIONS		(1 << 5)
-#define		XLHP_HAS_DEAD_ITEMS	        (1 << 6)
-#define		XLHP_HAS_NOW_UNUSED_ITEMS   (1 << 7)
+#define		XLHP_HAS_REDIRECTIONS		(1 << 6)
+#define		XLHP_HAS_DEAD_ITEMS	        (1 << 7)
+#define		XLHP_HAS_NOW_UNUSED_ITEMS   (1 << 8)
 
 /*
  * xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
@@ -497,7 +513,7 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
 								   uint8 vmflags);
 
 /* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
 												   OffsetNumber **frz_offsets,
 												   int *nredirected, OffsetNumber **redirected,
-- 
2.43.0



  [text/x-patch] v13-0007-Find-and-fix-VM-corruption-in-heap_page_prune_an.patch (12.1K, 6-v13-0007-Find-and-fix-VM-corruption-in-heap_page_prune_an.patch)
  download | inline diff:
From 1d875984902501382636e2d537874f6c4b6e6ea4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v13 07/20] Find and fix VM corruption in
 heap_page_prune_and_freeze

Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().

This commit is only really meant for review, as it adds a member to
PruneFreezeResult (vm_corruption) that is removed in later commits.
---
 src/backend/access/heap/pruneheap.c  | 93 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 84 +++----------------------
 src/include/access/heapam.h          |  4 ++
 3 files changed, 102 insertions(+), 79 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 343ab55e527..8f968b47c38 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
+
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+			heap_page_prune_and_freeze(relation, buffer, false,
+									   InvalidBuffer,
+									   vistest, 0,
 									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -294,6 +303,70 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +387,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
  * (see heap_prune_satisfies_vacuum).
  *
@@ -349,6 +426,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   bool blk_known_av,
+						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
 						   int options,
 						   struct VacuumCutoffs *cutoffs,
@@ -897,6 +976,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	/*
+	 * Clear any VM corruption. This does not need to be done in a critical
+	 * section.
+	 */
+	presult->vm_corruption = false;
+	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+																blockno, buffer, page,
+																blk_known_av,
+																prstate.lpdead_items, vmbuffer);
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d6818323932..a222c9f9164 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
-										   BlockNumber heap_blk,
-										   Buffer heap_buffer, Page heap_page,
-										   bool heap_blk_known_av,
-										   int64 nlpdead_items,
-										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1940,72 +1934,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
-							   BlockNumber heap_blk,
-							   Buffer heap_buffer, Page heap_page,
-							   bool heap_blk_known_av,
-							   int64 nlpdead_items,
-							   Buffer vmbuffer)
-{
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
-		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(relation), heap_blk)));
-
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(relation), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	return false;
-}
-
-
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2063,11 +1991,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+	heap_page_prune_and_freeze(rel, buf,
+							   all_visible_according_to_vm,
+							   vmbuffer,
+							   vacrel->vistest, prune_options,
 							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2152,10 +2083,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables. Start by looking for any VM corruption.
+	 * all_frozen variables.
 	 */
-	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
-									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	if (presult.vm_corruption)
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 7ec270feed0..a3b85fd1daf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
+	bool		vm_corruption;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   bool blk_known_av,
+									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
 									   int options,
 									   struct VacuumCutoffs *cutoffs,
-- 
2.43.0



  [text/x-patch] v13-0006-Combine-vacuum-phase-I-VM-update-cases.patch (5.8K, 7-v13-0006-Combine-vacuum-phase-I-VM-update-cases.patch)
  download | inline diff:
From 02228c3550e5626bddff072059326baad1ba1e1c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v13 06/20] Combine vacuum phase I VM update cases

We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.

Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.

The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
 src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
 1 file changed, 32 insertions(+), 69 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e9b4e924d22..d6818323932 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2159,11 +2159,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		/* Don't update the VM if we just cleared corruption in it */
 	}
-	else if (!all_visible_according_to_vm && presult.all_visible)
+
+	/*
+	 * If the page isn't yet marked all-visible in the VM or it is and needs
+	 * to be marked all-frozen, update the VM. Note that all_frozen is only
+	 * valid if all_visible is true, so we must check both all_visible and
+	 * all_frozen.
+	 */
+	else if (presult.all_visible &&
+			 (!all_visible_according_to_vm ||
+			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
+		/*
+		 * If the page is all-frozen, we can pass InvalidTransactionId as our
+		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+		 * everything safe for REDO was logged when the page's tuples were
+		 * frozen.
+		 */
 		if (presult.all_frozen)
 		{
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2176,21 +2191,29 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * If the heap page is all-visible but the VM bit is not set, we don't
+		 * need to dirty the heap page.  However, if checksums are enabled, we
+		 * do need to make sure that the heap page is dirtied before passing
+		 * it to visibilitymap_set(), because it may be logged.
 		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
+		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
 									   flags);
 
 		/*
+		 * Even if we are only setting the all-frozen bit, there is a small
+		 * chance that the VM was modified sometime between setting
+		 * all_visible_according_to_vm and checking the visibility during
+		 * pruning. Check the return value of old_vmbits to ensure the
+		 * visibility map counters used for logging are accurate.
+		 *
 		 * If the page wasn't already set all-visible and/or all-frozen in the
 		 * VM, count it as newly set for logging.
 		 */
@@ -2211,66 +2234,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v13-0008-Keep-all_frozen-updated-too-in-heap_page_prune_a.patch (3.1K, 8-v13-0008-Keep-all_frozen-updated-too-in-heap_page_prune_a.patch)
  download | inline diff:
From 12130b4f6fa88c9e748d45da860a5f8b1a7dd289 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v13 08/20] Keep all_frozen updated too in
 heap_page_prune_and_freeze

We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.

Future commits will separate usage of these fields, so it is best not to
rely on all_visible for all_frozen's validity.
---
 src/backend/access/heap/pruneheap.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8f968b47c38..3be4ae3ae2a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
 	 * whether to freeze the page or not.  The all_visible and all_frozen
 	 * values returned to the caller are adjusted to include LP_DEAD items at
 	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -830,6 +826,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -1474,7 +1471,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1496,7 +1493,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1509,7 +1506,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1528,7 +1525,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1546,7 +1543,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
-- 
2.43.0



  [text/x-patch] v13-0005-Combine-lazy_scan_prune-VM-corruption-cases.patch (7.4K, 9-v13-0005-Combine-lazy_scan_prune-VM-corruption-cases.patch)
  download | inline diff:
From cb4fb780867cd8736d4d7d5b8a49089a6105fee2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v13 05/20] Combine lazy_scan_prune VM corruption cases

lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.

Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.

This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
 src/backend/access/heap/vacuumlazy.c | 126 +++++++++++++++++----------
 1 file changed, 79 insertions(+), 47 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a1cdaaebb57..e9b4e924d22 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,6 +430,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+										   BlockNumber heap_blk,
+										   Buffer heap_buffer, Page heap_page,
+										   bool heap_blk_known_av,
+										   int64 nlpdead_items,
+										   Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1934,6 +1940,72 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+							   BlockNumber heap_blk,
+							   Buffer heap_buffer, Page heap_page,
+							   bool heap_blk_known_av,
+							   int64 nlpdead_items,
+							   Buffer vmbuffer)
+{
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+		visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
+
 /* qsort comparator for sorting OffsetNumbers */
 static int
 cmpOffsetNumbers(const void *a, const void *b)
@@ -2080,9 +2152,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * all_frozen variables. Start by looking for any VM corruption.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+									   all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+	{
+		/* Don't update the VM if we just cleared corruption in it */
+	}
+	else if (!all_visible_according_to_vm && presult.all_visible)
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2134,51 +2211,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		}
 	}
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-- 
2.43.0



  [text/x-patch] v13-0010-Rename-PruneState.freeze-to-attempt_freeze.patch (3.7K, 10-v13-0010-Rename-PruneState.freeze-to-attempt_freeze.patch)
  download | inline diff:
From 42817d95ec5bda5fb164167825b585731e4fdc70 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v13 10/20] Rename PruneState.freeze to attempt_freeze

This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.

Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
 src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ae59242a843..44b186a4560 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
 	/* whether or not dead items can be set LP_UNUSED during pruning */
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
-	bool		freeze;
+	bool		attempt_freeze;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -445,13 +445,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_hint;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
-	bool		hint_bit_fpi;
+	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
 	prstate.cutoffs = cutoffs;
 
 	/*
@@ -473,7 +473,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/* initialize page freezing working state */
 	prstate.pagefrz.freeze_required = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
 		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -520,7 +520,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * function, when we return the value to the caller, so that the caller
 	 * doesn't set the VM bit incorrectly.
 	 */
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -634,7 +634,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
 	 * an FPI to be emitted.
 	 */
-	hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	/*
 	 * Process HOT chains.
@@ -750,7 +750,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * plans we prepared, or not.
 	 */
 	do_freeze = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (prstate.pagefrz.freeze_required)
 		{
@@ -783,7 +783,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				 */
 				if (RelationNeedsWAL(relation))
 				{
-					if (hint_bit_fpi)
+					if (did_tuple_hint_fpi)
 						do_freeze = true;
 					else if (do_prune)
 					{
@@ -1046,7 +1046,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->old_vmbits = old_vmbits;
 	presult->new_vmbits = vmflags;
 
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
 		{
@@ -1628,7 +1628,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 	}
 
 	/* Consider freezing any normal tuples which will not be removed */
-	if (prstate->freeze)
+	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
 
-- 
2.43.0



  [text/x-patch] v13-0009-Update-VM-in-pruneheap.c.patch (12.5K, 11-v13-0009-Update-VM-in-pruneheap.c.patch)
  download | inline diff:
From 4305a7ac8c4b1ecbb863e4df9b293c8cf1b7a4e8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v13 09/20] Update VM in pruneheap.c

As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
 src/backend/access/heap/pruneheap.c  | 99 +++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 98 +++++----------------------
 src/include/access/heapam.h          | 15 +++--
 3 files changed, 106 insertions(+), 106 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3be4ae3ae2a..ae59242a843 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -366,7 +366,8 @@ identify_and_fix_vm_corruption(Relation relation,
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -442,6 +443,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint;
+	uint8		vmflags = 0;
+	uint8		old_vmbits = 0;
 	bool		hint_bit_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -942,7 +945,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 *
 	 * Now that freezing has been finalized, unset all_visible if there are
 	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
+	 * of the page, as expected for updating the visibility map.
 	 */
 	if (prstate.all_visible && prstate.lpdead_items == 0)
 	{
@@ -958,31 +961,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->hastup = prstate.hastup;
 
 	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so the VM update record doesn't need it.
 	 */
 	if (presult->all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
 		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
 
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
 	/*
-	 * Clear any VM corruption. This does not need to be done in a critical
-	 * section.
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
 	 */
-	presult->vm_corruption = false;
 	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
-		presult->vm_corruption = identify_and_fix_vm_corruption(relation,
-																blockno, buffer, page,
-																blk_known_av,
-																prstate.lpdead_items, vmbuffer);
+	{
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av,
+										   prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/*
+		 * If the page isn't yet marked all-visible in the VM or it is and
+		 * needs to be marked all-frozen, update the VM. Note that all_frozen
+		 * is only valid if all_visible is true, so we must check both
+		 * all_visible and all_frozen.
+		 */
+		else if (presult->all_visible &&
+				 (!blk_known_av ||
+				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			Assert(prstate.lpdead_items == 0);
+			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			/*
+			 * If the page is all-frozen, we can pass InvalidTransactionId as
+			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+			 * make everything safe for REDO was logged when the page's tuples
+			 * were frozen.
+			 */
+			if (presult->all_frozen)
+			{
+				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			}
+
+			/*
+			 * It's possible for the VM bit to be clear and the page-level bit
+			 * to be set if checksums are not enabled.
+			 *
+			 * And even if we are just planning to update the frozen bit in
+			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
+			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+			 * might have become stale.
+			 *
+			 * If the heap page is all-visible but the VM bit is not set, we
+			 * don't need to dirty the heap page.  However, if checksums are
+			 * enabled, we do need to make sure that the heap page is dirtied
+			 * before passing it to visibilitymap_set(), because it may be
+			 * logged.
+			 */
+			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+			{
+				PageSetAllVisible(page);
+				MarkBufferDirty(buffer);
+			}
+
+			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+										   vmbuffer, presult->vm_conflict_horizon,
+										   vmflags);
+		}
+	}
+
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
+
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = vmflags;
+
 	if (prstate.freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a222c9f9164..9492423141e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1949,7 +1949,8 @@ cmpOffsetNumbers(const void *a, const void *b)
  * vmbuffer is the buffer containing the VM block with visibility information
  * for the heap block, blkno. all_visible_according_to_vm is the saved
  * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1978,6 +1979,7 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
+	 * Then, if the page's visibility status has changed, update the VM.
 	 *
 	 * If the relation has no indexes, we can immediately mark would-be dead
 	 * items LP_UNUSED.
@@ -1986,10 +1988,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * presult.ndeleted.  It should not be confused with presult.lpdead_items;
 	 * presult.lpdead_items's final value can be thought of as the number of
 	 * tuples that were deleted from indexes.
-	 *
-	 * We will update the VM after collecting LP_DEAD items and freezing
-	 * tuples. Pruning will have determined whether or not the page is
-	 * all-visible.
 	 */
 	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
 	if (vacrel->nindexes == 0)
@@ -2081,88 +2079,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (presult.vm_corruption)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		/* Don't update the VM if we just cleared corruption in it */
-	}
-
-	/*
-	 * If the page isn't yet marked all-visible in the VM or it is and needs
-	 * to be marked all-frozen, update the VM. Note that all_frozen is only
-	 * valid if all_visible is true, so we must check both all_visible and
-	 * all_frozen.
-	 */
-	else if (presult.all_visible &&
-			 (!all_visible_according_to_vm ||
-			  (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		/*
-		 * If the page is all-frozen, we can pass InvalidTransactionId as our
-		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
-		 * everything safe for REDO was logged when the page's tuples were
-		 * frozen.
-		 */
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * If the heap page is all-visible but the VM bit is not set, we don't
-		 * need to dirty the heap page.  However, if checksums are enabled, we
-		 * do need to make sure that the heap page is dirtied before passing
-		 * it to visibilitymap_set(), because it may be logged.
-		 */
-		if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * Even if we are only setting the all-frozen bit, there is a small
-		 * chance that the VM was modified sometime between setting
-		 * all_visible_according_to_vm and checking the visibility during
-		 * pruning. Check the return value of old_vmbits to ensure the
-		 * visibility map counters used for logging are accurate.
-		 *
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
+	}
 
 	return presult.ndeleted;
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a3b85fd1daf..9952ae96b12 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
+	 * all_visible and all_frozen indicate the status of the page as reflected
+	 * in the visibility map after pruning, freezing, and setting any pages
+	 * all-visible in the visibility map.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * vm_conflict_horizon is the newest xmin of live tuples on the page
+	 * (older than OldestXmin).  It will only be valid if we did not set the
+	 * page all-frozen in the VM.
 	 *
 	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
-	bool		vm_corruption;
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
-- 
2.43.0



  [text/x-patch] v13-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patch (29.1K, 12-v13-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patch)
  download | inline diff:
From c5ae37129bbfcc7245f052976493a2c81e15a25b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v13 11/20] Eliminate xl_heap_visible from vacuum phase I
 prune/freeze

Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.

This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
 src/backend/access/heap/pruneheap.c  | 460 ++++++++++++++++-----------
 src/backend/access/heap/vacuumlazy.c |  30 --
 src/include/access/heapam.h          |  15 +-
 3 files changed, 283 insertions(+), 222 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 44b186a4560..74f7878c9ac 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+
+	/*
+	 * Whether or not to consider updating the VM. There is some bookkeeping
+	 * that must be maintained if we would like to update the VM.
+	 */
+	bool		consider_update_vm;
+
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
 	 *
 	 * These fields are not used by pruning itself for the most part, but are
 	 * used to collect information about what was pruned and what state the
-	 * page is in after pruning, for the benefit of the caller.  They are
-	 * copied to the caller's PruneFreezeResult at the end.
+	 * page is in after pruning to use when updating the visibility map and
+	 * for the benefit of the caller.  They are copied to the caller's
+	 * PruneFreezeResult at the end.
 	 * -------------------------------------------------------
 	 */
 
@@ -138,11 +146,10 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+	 * directly before updating the VM. We ignore LP_DEAD items when deciding
+	 * whether or not to opportunistically freeze and when determining the
+	 * snapshot conflict horizon required when freezing tuples.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -377,12 +384,15 @@ identify_and_fix_vm_corruption(Relation relation,
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
  *
  * blk_known_av is the visibility status of the heap block as of the last call
  * to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -398,6 +408,8 @@ identify_and_fix_vm_corruption(Relation relation,
  *   FREEZE indicates that we will also freeze tuples, and will return
  *   'all_visible', 'all_frozen' flags to the caller.
  *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -442,18 +454,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	HeapTupleData tup;
 	bool		do_freeze;
 	bool		do_prune;
-	bool		do_hint;
+	bool		do_hint_full_or_prunable;
+	bool		do_set_vm;
 	uint8		vmflags = 0;
 	uint8		old_vmbits = 0;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	bool		all_frozen_except_lp_dead = false;
+	bool		set_pd_all_visible = false;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate.cutoffs = cutoffs;
 
+	Assert(!prstate.consider_update_vm || vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -498,50 +516,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Keep track of whether or not the page will be all-visible and
+	 * all-frozen for use in opportunistic freezing and to update the VM if
+	 * the caller requests it.
+	 *
+	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
+	 * other callers could do either one. The visibility bookkeeping is
+	 * required for opportunistic freezing (in addition to setting the VM
+	 * bits) because we only consider opportunistically freezing tuples if the
+	 * whole page would become all-frozen or if the whole page will be frozen
+	 * except for dead tuples that will be removed by vacuum.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * If only updating the VM, we must initialize all_frozen to false, as
+	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
+	 * page and we will not end up correctly setting it to false later.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * Dead tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing, so we do not clear
+	 * all_visible when we see LP_DEAD items. We fix that after determining
+	 * whether or not to freeze but before deciding whether or not to update
+	 * the VM so that we don't set the VM bit incorrectly.
+	 *
+	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * bookkeeping. Initializing all_visible to false allows skipping the work
+	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
 	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if (prstate.consider_update_vm)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon.
+	 * This is most likely to happen when updating the VM and/or freezing all
+	 * live tuples on the page. It is updated before returning to the caller
+	 * because vacuum does assert-build only validation on the page using this
+	 * field.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -739,10 +764,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
+	 * pd_prune_xid field or the page was marked full, we will update those
+	 * hint bits.
 	 */
-	do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_full_or_prunable =
+		((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
 	/*
@@ -790,7 +816,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 						if (XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
 					}
-					else if (do_hint)
+					else if (do_hint_full_or_prunable)
 					{
 						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
@@ -829,11 +855,88 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	all_frozen_except_lp_dead = prstate.all_frozen;
+	if (prstate.lpdead_items > 0)
+	{
+		prstate.all_visible = false;
+		prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last heap_vac_scan_next_block() call), and from all_visible and
+	 * all_frozen variables.
+	 */
+	if (prstate.consider_update_vm)
+	{
+		/*
+		 * Clear any VM corruption. This does not need to be in a critical
+		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+		 * we may mark the heap page buffer dirty here and could end up doing
+		 * so again later. This is not a correctness issue and is in the path
+		 * of VM corruption, so we don't have to worry about the extra
+		 * performance overhead.
+		 */
+		if (identify_and_fix_vm_corruption(relation,
+										   blockno, buffer, page,
+										   blk_known_av, prstate.lpdead_items, vmbuffer))
+		{
+			/* If we fix corruption, don't update the VM further */
+		}
+
+		/* Determine if we actually need to set the VM and which bits to set. */
+		else if (prstate.all_visible &&
+				 (!blk_known_av ||
+				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+		{
+			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+			if (prstate.all_frozen)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+	}
+
+	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+	/* Save these for the caller in case we later zero out vmflags */
+	presult->new_vmbits = vmflags;
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
-	if (do_hint)
+	if (do_hint_full_or_prunable)
 	{
 		/*
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
@@ -849,15 +952,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageClearFull(page);
 
 		/*
-		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+		 * this is a non-WAL-logged hint.  If we are going to freeze or prune
+		 * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+		 * dirty and emit WAL below.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_prune && !do_freeze && !set_pd_all_visible)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -871,12 +975,47 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (set_pd_all_visible)
+			PageSetAllVisible(page);
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+		 * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+		 * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+		 * appropriate here.
 		 */
-		if (RelationNeedsWAL(relation))
+		if (do_prune || do_freeze || set_pd_all_visible)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			Assert(PageIsAllVisible(page));
+			old_vmbits = visibilitymap_set_vmbits(relation, blockno,
+												  vmbuffer, vmflags);
+
+			if (old_vmbits == vmflags)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				do_set_vm = false;
+				/* 0 out vmflags so we don't emit WAL to update the VM */
+				vmflags = 0;
+			}
+		}
+
+		/*
+		 * It should never be the case that PD_ALL_VISIBLE is not set and the
+		 * VM is set. Or, if it were, we should have caught it earlier when
+		 * finding and fixing VM corruption. So, if we found out the VM was
+		 * already set above, we should have found PD_ALL_VISIBLE set earlier.
+		 */
+		Assert(!set_pd_all_visible || do_set_vm);
+
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+		 * we were only updating the VM and it turns out it was already set,
+		 * we will have unset do_set_vm earlier. As such, check it again
+		 * before emitting the record.
+		 */
+		if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -888,35 +1027,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
-			TransactionId conflict_xid;
+			TransactionId conflict_xid = InvalidTransactionId;
+
+			/*
+			 * If we are updating the VM, the conflict horizon is almost
+			 * always the visibility cutoff XID.
+			 *
+			 * Separately, if we are freezing any tuples, as an optimization,
+			 * we can use the visibility_cutoff_xid as the conflict horizon if
+			 * the page will be all-frozen. This is true even if there are
+			 * LP_DEAD line pointers because we ignored those when maintaining
+			 * the visibility_cutoff_xid.
+			 */
+			if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+				conflict_xid = prstate.visibility_cutoff_xid;
 
 			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
+			 * Otherwise, if we are freezing but the page would not be
+			 * all-frozen, we have to use the more pessimistic horizon of
+			 * OldestXmin, which may be newer than the newest tuple we froze.
+			 * We currently don't track the newest tuple we froze.
 			 */
-			if (do_freeze)
+			else if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
+				conflict_xid = prstate.cutoffs->OldestXmin;
+				TransactionIdRetreat(conflict_xid);
 			}
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
+			/*
+			 * If we are removing tuples with a younger xmax than our so far
+			 * calculated conflict_xid, we must use this as our horizon.
+			 */
+			if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
 				conflict_xid = prstate.latest_xid_removed;
 
+			/*
+			 * We can omit the snapshot conflict horizon if we are not pruning
+			 * or freezing any tuples and are setting an already all-visible
+			 * page all-frozen in the VM. In this case, all of the tuples on
+			 * the page must already be visible to all MVCC snapshots on the
+			 * standby.
+			 */
+			if (!do_prune && !do_freeze && do_set_vm &&
+				blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+				conflict_xid = InvalidTransactionId;
+
 			log_heap_prune_and_freeze(relation, buffer,
 									  false,
-									  InvalidBuffer, 0, false,
+									  vmbuffer,
+									  vmflags,
+									  set_pd_all_visible,
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -928,124 +1088,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected for updating the visibility map.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
-	presult->hastup = prstate.hastup;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so the VM update record doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * VACUUM will call heap_page_is_all_visible() during the second pass over
+	 * the heap to determine all_visible and all_frozen for the page -- this
+	 * is a specialized version of the logic from this function.  Now that
+	 * we've finished pruning and freezing, make sure that we're in total
+	 * agreement with heap_page_is_all_visible() using an assertion. We will
+	 * have already set the page in the VM, so this assertion will only let
+	 * you know that you've already done something wrong.
 	 */
-	if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
 	{
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av,
-										   prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		/*
-		 * If the page isn't yet marked all-visible in the VM or it is and
-		 * needs to be marked all-frozen, update the VM. Note that all_frozen
-		 * is only valid if all_visible is true, so we must check both
-		 * all_visible and all_frozen.
-		 */
-		else if (presult->all_visible &&
-				 (!blk_known_av ||
-				  (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			Assert(prstate.lpdead_items == 0);
-			vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		Assert(cutoffs);
 
-			/*
-			 * If the page is all-frozen, we can pass InvalidTransactionId as
-			 * our cutoff_xid, since a snapshotConflictHorizon sufficient to
-			 * make everything safe for REDO was logged when the page's tuples
-			 * were frozen.
-			 */
-			if (presult->all_frozen)
-			{
-				Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			}
+		Assert(prstate.lpdead_items == 0);
 
-			/*
-			 * It's possible for the VM bit to be clear and the page-level bit
-			 * to be set if checksums are not enabled.
-			 *
-			 * And even if we are just planning to update the frozen bit in
-			 * the VM, we shouldn't rely on all_visible_according_to_vm as a
-			 * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
-			 * might have become stale.
-			 *
-			 * If the heap page is all-visible but the VM bit is not set, we
-			 * don't need to dirty the heap page.  However, if checksums are
-			 * enabled, we do need to make sure that the heap page is dirtied
-			 * before passing it to visibilitymap_set(), because it may be
-			 * logged.
-			 */
-			if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buffer);
-			}
+		if (!heap_page_is_all_visible(relation, buffer,
+									  cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
 
-			old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
-										   vmbuffer, presult->vm_conflict_horizon,
-										   vmflags);
-		}
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
+#endif
 
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->old_vmbits = old_vmbits;
+	/* new_vmbits was set above */
+	presult->hastup = prstate.hastup;
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = vmflags;
-
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
@@ -1627,7 +1718,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			break;
 	}
 
-	/* Consider freezing any normal tuples which will not be removed */
+	/*
+	 * Consider freezing any normal tuples which will not be removed.
+	 * Regardless of whether or not we want to freeze the tuples, if we want
+	 * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+	 * tuple to know whether or not the page will be totally frozen.
+	 */
 	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
@@ -2190,7 +2286,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phase III, the heap page may be marked
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
  *   all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
@@ -2248,6 +2344,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 
 	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
 
+	Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
 	regbuf_flags = REGBUF_STANDARD;
 
 	if (force_heap_fpi)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9492423141e..75205179b83 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2015,34 +2015,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2076,8 +2048,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9952ae96b12..3679928d43e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate the status of the page as reflected
-	 * in the visibility map after pruning, freezing, and setting any pages
-	 * all-visible in the visibility map.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page
-	 * (older than OldestXmin).  It will only be valid if we did not set the
-	 * page all-frozen in the VM.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
 	uint8		old_vmbits;
 	uint8		new_vmbits;
 
-- 
2.43.0



  [text/x-patch] v13-0012-Remove-xl_heap_visible-entirely.patch (24.7K, 13-v13-0012-Remove-xl_heap_visible-entirely.patch)
  download | inline diff:
From c3dd8565db8e6f0273a497c584f53bd10057b4b9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v13 12/20] Remove xl_heap_visible entirely

There are now no users of this, so eliminate it entirely.
---
 src/backend/access/common/bufmask.c      |   3 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 158 +----------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  10 +-
 src/backend/access/heap/visibilitymap.c  | 109 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  11 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 32 insertions(+), 370 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c8cd9d22726..dfa9d5a460d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
 #include "access/valid.h"
 #include "access/visibilitymap.h"
 #include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_database_d.h"
 #include "commands/vacuum.h"
@@ -2524,11 +2525,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(relation,
-									 BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set(relation,
+							  BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN);
 		}
 
 		/*
@@ -8799,49 +8800,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
 
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 83b39a9102c..84c2924967d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -247,9 +247,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	 * the VM is set.
 	 *
 	 * In recovery, we expect no other writers, so writing to the VM page
-	 * without holding a lock on the heap page is considered safe enough. It
-	 * is done this way when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * without holding a lock on the heap page is considered safe enough.
 	 */
 	if (vmflags & VISIBILITYMAP_VALID_BITS &&
 		XLogReadBufferForRedoExtended(record, 1,
@@ -265,7 +263,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+		old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
 
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
@@ -284,142 +282,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -796,9 +658,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 * the VM is set.
 	 *
 	 * In recovery, we expect no other writers, so writing to the VM page
-	 * without holding a lock on the heap page is considered safe enough. It
-	 * is done this way when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * without holding a lock on the heap page is considered safe enough.
 	 */
 	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -811,15 +671,14 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(reln, blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN);
-
 		/*
 		 * It is not possible that the VM was already set for this heap page,
 		 * so the vmbuffer must have been modified and marked dirty.
 		 */
+		visibilitymap_set(reln, blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		FreeFakeRelcacheEntry(reln);
@@ -1400,9 +1259,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 74f7878c9ac..538e82db8e6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -989,8 +989,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_set_vm)
 		{
 			Assert(PageIsAllVisible(page));
-			old_vmbits = visibilitymap_set_vmbits(relation, blockno,
-												  vmbuffer, vmflags);
+			old_vmbits = visibilitymap_set(relation, blockno,
+										   vmbuffer, vmflags);
 
 			if (old_vmbits == vmflags)
 			{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 75205179b83..2dcca071a45 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1888,8 +1888,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			visibilitymap_set_vmbits(vacrel->rel, blkno,
-									 vmbuffer, new_vmbits);
+			visibilitymap_set(vacrel->rel, blkno,
+							  vmbuffer, new_vmbits);
 
 			if (RelationNeedsWAL(vacrel->rel))
 			{
@@ -2757,9 +2757,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		Assert(!PageIsAllVisible(page));
 		set_pd_all_vis = true;
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(vacrel->rel,
-								 blkno,
-								 vmbuffer, vmflags);
+		visibilitymap_set(vacrel->rel,
+						  blkno,
+						  vmbuffer, vmflags);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index aa48a436108..f7bad68ffc5 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set flags in the VM block contained in the passed in vmBuf.
  *
@@ -343,8 +240,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * is pinned and exclusive locked.
  */
 uint8
-visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 1cb44ca32d3..93505cb8c56 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -460,9 +453,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 6d759c197a1..cdd6acbea1c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -450,20 +449,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -507,11 +492,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index fc7056a91ea..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..b4c880c083f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4273,7 +4273,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v13-0015-Inline-TransactionIdFollows-Precedes.patch (5.0K, 14-v13-0015-Inline-TransactionIdFollows-Precedes.patch)
  download | inline diff:
From 11aaf7bb0e74e846631bd6e82aae6f2ecf19e431 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v13 15/20] Inline TransactionIdFollows/Precedes()

Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/transam/transam.c | 64 -------------------------
 src/include/access/transam.h         | 70 ++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
-	/*
-	 * If either ID is a permanent XID then we can just do unsigned
-	 * comparison.  If both are normal, do a modulo-2^32 comparison.
-	 */
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 < id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 <= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 > id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 >= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff >= 0);
-}
-
 
 /*
  * TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
 } TransamVariablesData;
 
 
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+	/*
+	 * If either ID is a permanent XID then we can just do unsigned
+	 * comparison.  If both are normal, do a modulo-2^32 comparison.
+	 */
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 < id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 <= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 > id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 >= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff >= 0);
+}
+
+
 /* ----------------
  *		extern declarations
  * ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
 extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
-- 
2.43.0



  [text/x-patch] v13-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (7.1K, 15-v13-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From 12caa7c46ccd46d6b62efd58aac1eb1166bc141f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v13 13/20] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()

Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 14 +++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 13 ++++++-------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 538e82db8e6..480ada99e22 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -580,9 +580,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1182,11 +1182,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..f67f01c17c2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v13-0014-Use-GlobalVisState-to-determine-page-level-visib.patch (10.8K, 16-v13-0014-Use-GlobalVisState-to-determine-page-level-visib.patch)
  download | inline diff:
From 1c10116933e50ae2be845f38f5c09f13382bd0f6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v13 14/20] Use GlobalVisState to determine page level
 visibility

During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.

It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.

Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.

Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
 src/backend/access/heap/pruneheap.c         | 48 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 19 ++++----
 src/include/access/heapam.h                 |  4 +-
 4 files changed, 60 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 480ada99e22..b8ca1be15a0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon, when setting the VM or when
+	 * freezing all the live tuples on the page.
 	 *
 	 * NOTE: all_visible and all_frozen don't include LP_DEAD items until
 	 * directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -559,14 +558,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon.
-	 * This is most likely to happen when updating the VM and/or freezing all
-	 * live tuples on the page. It is updated before returning to the caller
-	 * because vacuum does assert-build only validation on the page using this
-	 * field.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState.
+	 *
+	 * If we encounter an uncommitted tuple, this field is unmaintained. If
+	 * the page is being set all-visible or when freezing all live tuples on
+	 * the page, it is used to calculate the snapshot conflict horizon.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -762,6 +759,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -1108,12 +1115,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(cutoffs);
-
 		Assert(prstate.lpdead_items == 0);
 
 		if (!heap_page_is_all_visible(relation, buffer,
-									  cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1638,19 +1643,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2dcca071a45..4ad05ba4db6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2717,7 +2717,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 InvalidOffsetNumber);
 
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3462,13 +3462,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3487,7 +3487,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * *all_frozen is an output parameter indicating to the caller if every tuple
  * on the page is frozen.
@@ -3508,7 +3508,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3580,8 +3580,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		tuple.t_len = ItemIdGetLength(itemid);
 		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
-										 buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+												  buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3600,8 +3600,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin,
-											   OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3679928d43e..fcd882cb03b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
 								HeapTuple oldtup, Buffer buffer);
 
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v13-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch (27.1K, 17-v13-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From f7cb1704e5716def42f8b0cdcbb6c390525c4cff Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v13 17/20] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c              | 15 ++++-
 src/backend/access/heap/heapam_handler.c      | 15 ++++-
 src/backend/access/heap/pruneheap.c           | 67 ++++++++++++++-----
 src/backend/access/index/indexam.c            | 46 +++++++++++++
 src/backend/access/table/tableam.c            | 39 +++++++++--
 src/backend/executor/execMain.c               |  4 ++
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  7 +-
 src/backend/executor/nodeIndexscan.c          | 18 +++--
 src/backend/executor/nodeSeqscan.c            | 24 +++++--
 src/include/access/genam.h                    | 11 +++
 src/include/access/heapam.h                   | 24 ++++++-
 src/include/access/relscan.h                  |  6 ++
 src/include/access/tableam.h                  | 30 ++++++++-
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  3 +-
 16 files changed, 277 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dfa9d5a460d..eedc7cb07bf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1247,6 +1250,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1285,6 +1289,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1317,6 +1327,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4f4a0af1f04..7523b936769 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			int			options = 0;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				options = HEAP_PAGE_PRUNE_UPDATE_VM;
+			}
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * that during on-access pruning with the current implementation.
 			 */
 			heap_page_prune_and_freeze(relation, buffer, false,
-									   InvalidBuffer,
-									   vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+									   vmbuffer ? *vmbuffer : InvalidBuffer,
+									   vistest, options,
+									   NULL, &presult, PRUNE_ON_ACCESS,
+									   &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -519,12 +531,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * all-frozen for use in opportunistic freezing and to update the VM if
 	 * the caller requests it.
 	 *
-	 * Currently, only VACUUM attempts freezing and setting the VM bits. But
-	 * other callers could do either one. The visibility bookkeeping is
-	 * required for opportunistic freezing (in addition to setting the VM
-	 * bits) because we only consider opportunistically freezing tuples if the
-	 * whole page would become all-frozen or if the whole page will be frozen
-	 * except for dead tuples that will be removed by vacuum.
+	 * Currently, only VACUUM attempts freezing. But other callers could. The
+	 * visibility bookkeeping is required for opportunistic freezing (in
+	 * addition to setting the VM bits) because we only consider
+	 * opportunistically freezing tuples if the whole page would become
+	 * all-frozen or if the whole page will be frozen except for dead tuples
+	 * that will be removed by vacuum. But if consider_update_vm is false,
+	 * we'll not set the VM even if the page is discovered to be all-visible.
+	 *
+	 * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+	 * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+	 * because we will not call heap_prepare_freeze_tuple() on each tuple.
 	 *
 	 * If only updating the VM, we must initialize all_frozen to false, as
 	 * heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -536,7 +553,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * whether or not to freeze but before deciding whether or not to update
 	 * the VM so that we don't set the VM bit incorrectly.
 	 *
-	 * If not freezing or updating the VM, we otherwise avoid the extra
+	 * If not freezing and not updating the VM, we avoid the extra
 	 * bookkeeping. Initializing all_visible to false allows skipping the work
 	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
@@ -885,12 +902,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.all_frozen = false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate.consider_update_vm &&
+		prstate.all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate.consider_update_vm = false;
+		prstate.all_visible = prstate.all_frozen = false;
+	}
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * Handle setting visibility map bit based on information from the VM (if
+	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+	 * call), and from all_visible and all_frozen variables.
 	 */
 	if (prstate.consider_update_vm)
 	{
@@ -2284,8 +2319,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
  * all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff12e2e1364..2e0474c948a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index fcd882cb03b..2210a5e0a79 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index b2ce35e2a34..e31c21cf8eb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -881,6 +883,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -918,10 +939,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, struct ScanKeyData *key)
+				   int nkeys, struct ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index de782014b2d..839c1be1d7c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -678,6 +678,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v13-0018-Add-helper-functions-to-heap_page_prune_and_free.patch (19.2K, 18-v13-0018-Add-helper-functions-to-heap_page_prune_and_free.patch)
  download | inline diff:
From b726ee54192582e78c5a3866d2a819993c2c798a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v13 18/20] Add helper functions to heap_page_prune_and_freeze

heap_page_prune_and_freeze() has gotten rather long. It has several
stages:

1) setup - where the PruneState is set up
2) tuple examination -- where tuples and line pointers are examined to
   determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
   heuristics, and state gathered during stage 2 whether or not to
   freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged

This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.

XXX: For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
 src/backend/access/heap/pruneheap.c | 473 +++++++++++++++++-----------
 1 file changed, 296 insertions(+), 177 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7523b936769..33a35dc5aab 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+								  bool do_prune,
+								  bool do_hint_full_or_prunable,
+								  bool did_tuple_hint_fpi,
+								  PruneState *prstate,
+								  bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+									 Buffer buffer, BlockNumber blockno, Page page,
+									 PruneReason reason,
+									 bool do_prune, bool do_freeze,
+									 bool blk_known_av,
+									 PruneState *prstate,
+									 Buffer *vmbuffer, uint8 *vmflags,
+									 bool *set_pd_all_visible);
+
 static bool identify_and_fix_vm_corruption(Relation relation,
 										   BlockNumber heap_blk,
 										   Buffer heap_buffer, Page heap_page,
@@ -382,6 +398,249 @@ identify_and_fix_vm_corruption(Relation relation,
 	return false;
 }
 
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+						 Buffer buffer, BlockNumber blockno, Page page,
+						 PruneReason reason,
+						 bool do_prune, bool do_freeze,
+						 bool blk_known_av,
+						 PruneState *prstate,
+						 Buffer *vmbuffer, uint8 *vmflags,
+						 bool *set_pd_all_visible)
+{
+	bool		do_set_vm = false;
+
+	/*
+	 * If the caller specified not to update the VM, validate everything is in
+	 * the right state and exit.
+	 */
+	if (!prstate->consider_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		/* We don't set only the page level visibility hint */
+		Assert(!(*set_pd_all_visible));
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->consider_update_vm &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+	{
+		prstate->consider_update_vm = false;
+		prstate->all_visible = prstate->all_frozen = false;
+	}
+
+	Assert(!prstate->all_frozen || prstate->all_visible);
+
+	/*
+	 * Clear any VM corruption. This does not need to be in a critical
+	 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+	 * may mark the heap page buffer dirty here and could end up doing so
+	 * again later. This is not a correctness issue and is in the path of VM
+	 * corruption, so we don't have to worry about the extra performance
+	 * overhead.
+	 */
+	if (identify_and_fix_vm_corruption(relation,
+									   blockno, buffer, page,
+									   blk_known_av, prstate->lpdead_items,
+									   *vmbuffer))
+	{
+		/* If we fix corruption, don't update the VM further */
+	}
+
+	/* Determine if we actually need to set the VM and which bits to set. */
+	else if (prstate->all_visible &&
+			 (!blk_known_av ||
+			  (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+	{
+		*vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+	}
+
+	do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+	/*
+	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+	 * set, we strongly prefer to keep them in sync.
+	 *
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 */
+	*set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+	return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool do_prune,
+					  bool do_hint_full_or_prunable,
+					  bool did_tuple_hint_fpi,
+					  PruneState *prstate,
+					  bool *all_frozen_except_lp_dead)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and exit.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		Assert(!(*all_frozen_except_lp_dead));
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_full_or_prunable)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	/*
+	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+	 * make the choice of whether or not to freeze the page unaffected by the
+	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that freezing has been finalized, unset all_visible if there are
+	 * any LP_DEAD items on the page. It needs to reflect the present state of
+	 * the page when using it to determine whether or not to update the VM.
+	 *
+	 * Keep track of whether or not the page was all-frozen except LP_DEAD
+	 * items for the purposes of calculating the snapshot conflict horizon,
+	 * though.
+	 */
+	*all_frozen_except_lp_dead = prstate->all_frozen;
+	if (prstate->lpdead_items > 0)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
+
+	return do_freeze;
+}
+
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page. If the page's visibility status has changed, update it in
@@ -772,20 +1031,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Clear the offset information once we have processed the given page. */
 	*off_loc = InvalidOffsetNumber;
 
-	do_prune = prstate.nredirected > 0 ||
-		prstate.ndead > 0 ||
-		prstate.nunused > 0;
-
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
-	 * all-visible.
+	 * all-visible. This must be done before we decide whether or not to
+	 * opportunistically freeze below because we do not want to
+	 * opportunistically freeze the page if there are live tuples not visible
+	 * to everyone, which would prevent setting the page frozen in the VM.
 	 */
 	if (prstate.all_visible &&
 		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
+	/*
+	 * Now decide based on information collected while examining every tuple
+	 * which actions to take. If there are any prunable tuples, we'll prune
+	 * them. However, we will decide based on options specified by the caller
+	 * and various heuristics whether or not to freeze any tuples and whether
+	 * or not the page should be set all-visible/all-frozen in the VM.
+	 */
+	do_prune = prstate.nredirected > 0 ||
+		prstate.ndead > 0 ||
+		prstate.nunused > 0;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update those
@@ -796,186 +1065,36 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		PageIsFull(page);
 
 	/*
-	 * Decide if we want to go ahead with freezing according to the freeze
-	 * plans we prepared, or not.
-	 */
-	do_freeze = false;
-	if (prstate.attempt_freeze)
-	{
-		if (prstate.pagefrz.freeze_required)
-		{
-			/*
-			 * heap_prepare_freeze_tuple indicated that at least one XID/MXID
-			 * from before FreezeLimit/MultiXactCutoff is present.  Must
-			 * freeze to advance relfrozenxid/relminmxid.
-			 */
-			do_freeze = true;
-		}
-		else
-		{
-			/*
-			 * Opportunistically freeze the page if we are generating an FPI
-			 * anyway and if doing so means that we can set the page
-			 * all-frozen afterwards (might not happen until VACUUM's final
-			 * heap pass).
-			 *
-			 * XXX: Previously, we knew if pruning emitted an FPI by checking
-			 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze
-			 * and prune records were combined, this heuristic couldn't be
-			 * used anymore.  The opportunistic freeze heuristic must be
-			 * improved; however, for now, try to approximate the old logic.
-			 */
-			if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
-			{
-				/*
-				 * Freezing would make the page all-frozen.  Have already
-				 * emitted an FPI or will do so anyway?
-				 */
-				if (RelationNeedsWAL(relation))
-				{
-					if (did_tuple_hint_fpi)
-						do_freeze = true;
-					else if (do_prune)
-					{
-						if (XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-					else if (do_hint_full_or_prunable)
-					{
-						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
-	}
-	else if (prstate.nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate.pagefrz.freeze_required);
-
-		prstate.all_frozen = false;
-		prstate.nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page. It needs to reflect the present state of
-	 * the page when using it to determine whether or not to update the VM.
-	 *
-	 * Keep track of whether or not the page was all-frozen except LP_DEAD
-	 * items for the purposes of calculating the snapshot conflict horizon,
-	 * though.
+	 * We must decide whether or not to freeze before deciding if and what to
+	 * set in the VM.
 	 */
-	all_frozen_except_lp_dead = prstate.all_frozen;
-	if (prstate.lpdead_items > 0)
-	{
-		prstate.all_visible = false;
-		prstate.all_frozen = false;
-	}
+	do_freeze = heap_page_will_freeze(relation, buffer,
+									  do_prune,
+									  do_hint_full_or_prunable,
+									  did_tuple_hint_fpi,
+									  &prstate,
+									  &all_frozen_except_lp_dead);
+
+	do_set_vm = heap_page_will_update_vm(relation,
+										 buffer, blockno, page,
+										 reason,
+										 do_prune, do_freeze,
+										 blk_known_av,
+										 &prstate,
+										 &vmbuffer,
+										 &vmflags, &set_pd_all_visible);
 
-	/*
-	 * If this is an on-access call and we're not actually pruning, avoid
-	 * setting the visibility map if it would newly dirty the heap page or, if
-	 * the page is already dirty, if doing so would require including a
-	 * full-page image (FPI) of the heap page in the WAL. This situation
-	 * should be rare, as on-access pruning is only attempted when
-	 * pd_prune_xid is valid.
-	 */
-	if (reason == PRUNE_ON_ACCESS &&
-		prstate.consider_update_vm &&
-		prstate.all_visible &&
-		!do_prune && !do_freeze &&
-		(!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
-	{
-		prstate.consider_update_vm = false;
-		prstate.all_visible = prstate.all_frozen = false;
-	}
-
-	Assert(!prstate.all_frozen || prstate.all_visible);
-
-	/*
-	 * Handle setting visibility map bit based on information from the VM (if
-	 * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
-	 * call), and from all_visible and all_frozen variables.
-	 */
-	if (prstate.consider_update_vm)
-	{
-		/*
-		 * Clear any VM corruption. This does not need to be in a critical
-		 * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
-		 * we may mark the heap page buffer dirty here and could end up doing
-		 * so again later. This is not a correctness issue and is in the path
-		 * of VM corruption, so we don't have to worry about the extra
-		 * performance overhead.
-		 */
-		if (identify_and_fix_vm_corruption(relation,
-										   blockno, buffer, page,
-										   blk_known_av, prstate.lpdead_items, vmbuffer))
-		{
-			/* If we fix corruption, don't update the VM further */
-		}
-
-		/* Determine if we actually need to set the VM and which bits to set. */
-		else if (prstate.all_visible &&
-				 (!blk_known_av ||
-				  (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
-		{
-			vmflags |= VISIBILITYMAP_ALL_VISIBLE;
-			if (prstate.all_frozen)
-				vmflags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-	}
-
-	do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+	/* Save these for the caller in case we later zero out vmflags */
+	presult->new_vmbits = vmflags;
 
-	/* Lock vmbuffer before entering a critical section */
+	/* Lock vmbuffer before entering critical section */
 	if (do_set_vm)
 		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/*
-	 * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
-	 * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
-	 * set, we strongly prefer to keep them in sync.
-	 *
-	 * Prior to Postgres 19, it was possible for the page-level bit to be set
-	 * and the VM bit to be clear. This could happen if we crashed after
-	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 * Time to actually make the changes to the page and log them. Any error
+	 * while applying the changes is critical.
 	 */
-	set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
-
-	/* Save these for the caller in case we later zero out vmflags */
-	presult->new_vmbits = vmflags;
-
-	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
 	if (do_hint_full_or_prunable)
-- 
2.43.0



  [text/x-patch] v13-0016-Unset-all-visible-sooner-if-not-freezing.patch (2.5K, 19-v13-0016-Unset-all-visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 8de90b1801ac3b59977d90c15aee854b40f3f043 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v13 16/20] Unset all-visible sooner if not freezing

In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.

Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.

Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
 src/backend/access/heap/pruneheap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b8ca1be15a0..4f4a0af1f04 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1503,8 +1503,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1762,8 +1765,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v13-0020-Set-pd_prune_xid-on-insert.patch (6.5K, 20-v13-0020-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From e041d4a571d77f9159d914a89dcfc49a9419463d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v13 20/20] Set pd_prune_xid on insert

Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.

For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.

Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.

ci-os-only:
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index eedc7cb07bf..442e557aeaf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2164,15 +2165,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2182,7 +2187,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2546,8 +2550,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 84c2924967d..eed0619c0ad 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -481,6 +481,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -630,9 +636,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



  [text/x-patch] v13-0019-Reorder-heap_page_prune_and_freeze-parameters.patch (5.8K, 21-v13-0019-Reorder-heap_page_prune_and_freeze-parameters.patch)
  download | inline diff:
From 696833a9710180be1d90507a9f267d4436327b2c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v13 19/20] Reorder heap_page_prune_and_freeze parameters

Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
 src/backend/access/heap/pruneheap.c  | 38 ++++++++++++++--------------
 src/backend/access/heap/vacuumlazy.c |  6 ++---
 src/include/access/heapam.h          |  4 +--
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 33a35dc5aab..c1c0dae87ba 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, false,
+			heap_page_prune_and_freeze(relation, buffer, options, false,
 									   vmbuffer ? *vmbuffer : InvalidBuffer,
-									   vistest, options,
-									   NULL, &presult, PRUNE_ON_ACCESS,
+									   vistest,
+									   NULL, PRUNE_ON_ACCESS, &presult,
 									   &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -651,6 +651,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
+ * options:
+ *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ *   pruning.
+ *
+ *   FREEZE indicates that we will also freeze tuples, and will return
+ *   'all_visible', 'all_frozen' flags to the caller.
+ *
+ *   UPDATE_VM indicates that we will set the page's status in the VM.
+ *
  * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
@@ -669,30 +678,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * contain the required block of the visibility map.
  *
  * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- *   UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
  *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed.  It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -705,13 +705,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   int options,
 						   bool blk_known_av,
 						   Buffer vmbuffer,
 						   GlobalVisState *vistest,
-						   int options,
 						   struct VacuumCutoffs *cutoffs,
-						   PruneFreezeResult *presult,
 						   PruneReason reason,
+						   PruneFreezeResult *presult,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4ad05ba4db6..4fb915e1d94 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1993,11 +1993,11 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf,
+	heap_page_prune_and_freeze(rel, buf, prune_options,
 							   all_visible_according_to_vm,
 							   vmbuffer,
-							   vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+							   vacrel->vistest,
+							   &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2210a5e0a79..ca3f37c2925 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
 								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   int options,
 									   bool blk_known_av,
 									   Buffer vmbuffer,
 									   struct GlobalVisState *vistest,
-									   int options,
 									   struct VacuumCutoffs *cutoffs,
-									   PruneFreezeResult *presult,
 									   PruneReason reason,
+									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-10 20:01  Robert Haas <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Robert Haas @ 2025-09-10 20:01 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Tue, Sep 9, 2025 at 7:08 PM Melanie Plageman
<[email protected]> wrote:
> Fair. I've introduced new XLHP flags in attached v13. Hopefully it
> puts an end to the horror.

I suggest not renumbering all of the existing flags and just adding
these new ones at the end. Less code churn and more likely to break in
an obvious way if you mix up the two sets of flags.

More on 0002:

+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;

Maybe just if (XLogHintBitIsNeeded) set_heap_lsn = true? I don't feel
super-strongly that what you've done is bad but it looks weird to my
eyes.

+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, measure the page's freespace to later update the

"setting a page in the visibility map" seems a little muddled to me.
You set bits, not pages.

+ * all-visible (or all-frozen, depending on the vacuum mode,) which is

This comma placement is questionable.

  /*
+ * Note that the heap relation may have been dropped or truncated, leading
+ * us to skip updating the heap block due to the LSN interlock. However,
+ * even in that case, it's still safe to update the visibility map. Any
+ * WAL record that clears the visibility map bit does so before checking
+ * the page LSN, so any bits that need to be cleared will still be
+ * cleared.
+ *
+ * Note that the lock on the heap page was dropped above. In normal
+ * operation this would never be safe because a concurrent query could
+ * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+ * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+ * the VM is set.
+ *
+ * In recovery, we expect no other writers, so writing to the VM page
+ * without holding a lock on the heap page is considered safe enough. It
+ * is done this way when replaying xl_heap_visible records (see
  */

How many copies of this comment do you plan to end up with?

The comment for log_heap_prune_and_freeze seems to be anticipating future work.

> > 0004. It is not clear to me why you need to get
> > log_heap_prune_and_freeze to do the work here. Why can't
> > log_newpage_buffer get the job done already?
>
> Well, I need something to emit the changes to the VM. I'm eliminating
> all users of xl_heap_visible. Empty pages are the ones that benefit
> the least from switching from xl_heap_visible -> xl_heap_prune. But,
> if I don't transition them, we have to maintain all the
> xl_heap_visible code (including visibilitymap_set() in its long form).
>
> As for log_newpage_buffer(), I could keep it if you think it is too
> confusing to change log_heap_prune_and_freeze()'s API (by passing
> force_heap_fpi) to handle this case, I can leave log_newpage_buffer()
> there and then call log_heap_prune_and_freeze().
>
> I just thought it seemed simple to avoid emitting the new page record
> and the VM update record, so why not -- but I don't have strong
> feelings.

Yeah, I'm not sure what the right thing to do here is. I think I was
again experiencing brain fade by forgetting that there is a heap page
and a VM page and, of course, log_heap_newpage() probably isn't going
to touch the latter. So that makes sense. On the other hand, we could
only have one type of WAL record for every single operation in the
system if we gave it enough flags, and force_heap_fpi seems
suspiciously like a flag that turns this into a whole different kind
of WAL record.

> > 0005. It looks a little curious that you delete the
> > identify-corruption logic from the end of the if-nest and add it to
> > the beginning. Ceteris paribus, you'd expect that to be worse, since
> > corruption is a rare case.
>
> On master, the two corruption cases are sandwiched between the normal
> VM set cases. And I actually think doing it this way is brittle. If
> you put the cases which set the VM first, you have to have completely
> bulletproof the if statements guarding them to foreclose any possible
> corruption case from entering because otherwise you will overwrite the
> corruption you then try to detect.

Hmm. In the current code, we first test (!all_visible_according_to_vm
&& presult.all_visible), then (all_visible_according_to_vm &&
!PageIsAllVisible(page) && visibilitymap_get_status(vacrel->rel,
blkno, &vmbuffer) != 0), and then (presult.lpdead_items > 0 &&
PageIsAllVisible(page)). The first and second can never coexist,
because they require opposite values of all_visible_according_to_vm.
The second and third cannot coexist because they require opposite
values of PageIsAllVisible(page). It is not entirely obvious that the
first and third tests couldn't both pass, but you'd have to have
presult.all_visible and presult.lpdead_items > 0, and it's a bit hard
to see how heap_page_prune_and_freeze() could ever allow that.
Consider:

    if (prstate.all_visible && prstate.lpdead_items == 0)
    {
        presult->all_visible = prstate.all_visible;
        presult->all_frozen = prstate.all_frozen;
    }
    else
    {
        presult->all_visible = false;
        presult->all_frozen = false;
    }
...
    presult->lpdead_items = prstate.lpdead_items;

So I don't really think I'm persuaded that the current way is brittle.
But that having been said, I agree with you that the order of the
checks is kind of random, and I don't think it really matters that
much for performance. What does matter is clarity. I feel like what
I'd ideally like this logic to do is say: do we want the VM bit for
the page to be set to all-frozen, just all-visible, or neither? Then
push the VM bit to the correct state, dragging the page-level bit
along behind. And the current logic sort of does that. It's roughly:

1. Should we go from not-all-visible to either all-visible or
all-frozen? If yes, do so.
2. Should we go from either all-visible or all-frozen to
not-all-visible? If yes, do so.
3. Should we go from either all-visible or all-frozen to
not-all-visible for a different reason? If yes, do so.
4. Should we go from all-visible to all-frozen? If yes, do so.

But what's weird is that all the tests are written differently, and we
have two different reasons for going to not-all-visible, namely
PD_ALL_VISIBLE-not-set and dead-items-on-page, whereas there's only
one test for each of the other state-transitions, because the
decision-making for those cases is fully completed at an earlier
stage. I would kind of like to see this expressed in a way that first
decides which state transition to make (forward-to-all-frozen,
forward-to-all-visible, backward-to-all-visible,
backward-to-not-all-visible, nothing) and then does the corresponding
work. What you're doing instead is splitting half of those functions
off into a helper function while keeping the other half where they are
without cleaning up any of the logic. Now, maybe that's OK: I'm far
from having grokked the whole patch set. But it is not any more clear
than what we have now, IMHO, and perhaps even a bit less so.

-- 
Robert Haas
EDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-18 00:10  Melanie Plageman <[email protected]>
  parent: Robert Haas <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-09-18 00:10 UTC (permalink / raw)
  To: Robert Haas <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, Sep 10, 2025 at 4:01 PM Robert Haas <[email protected]> wrote:
>
> On Tue, Sep 9, 2025 at 7:08 PM Melanie Plageman
> <[email protected]> wrote:
> > Fair. I've introduced new XLHP flags in attached v13. Hopefully it
> > puts an end to the horror.
>
> I suggest not renumbering all of the existing flags and just adding
> these new ones at the end. Less code churn and more likely to break in
> an obvious way if you mix up the two sets of flags.

Makes sense. In my attached v14, I have not renumbered them.

> More on 0002:

After an off-list discussion we had about how to make the patches in
the set progressively improve the code instead of just mechanically
refactoring it, I have made some big changes in the intermediate
patches in the set.

Before actually including the VM changes in the vacuum/prune WAL
records, I first include setting PD_ALL_VISIBLE with the other changes
to the heap page so that we can remove the heap page from the VM
setting WAL chain. This happens to fix the bug we discussed where if
you set an all-visible page all-frozen and checksums/wal_log_hints are
enabled, you may end up setting an LSN on a page that was not marked
dirty.

0001 is RFC but waiting on one other reviewer
0002 - 0007 is a bit of cleanup I had later in the patch set but moved
up because I think it made the intermediate patches better
0008 - 0012 removes the heap page from the XLOG_HEAP2_VISIBLE WAL
chain (it makes all callers of visibilitymap_set() set PD_ALL_VISIBLE
in the same WAL record as changes to the heap page)
0013 - 0018 finish the job eliminating XLOG_HEAP2_VISIBLE and set VM
bits in the same WAL record as the heap changes
0019 - 0024 set the VM on-access

>   /*
> + * Note that the heap relation may have been dropped or truncated, leading
> + * us to skip updating the heap block due to the LSN interlock. However,
> + * even in that case, it's still safe to update the visibility map. Any
> + * WAL record that clears the visibility map bit does so before checking
> + * the page LSN, so any bits that need to be cleared will still be
> + * cleared.
> + *
> + * Note that the lock on the heap page was dropped above. In normal
> + * operation this would never be safe because a concurrent query could
> + * modify the heap page and clear PD_ALL_VISIBLE -- violating the
> + * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
> + * the VM is set.
> + *
> + * In recovery, we expect no other writers, so writing to the VM page
> + * without holding a lock on the heap page is considered safe enough. It
> + * is done this way when replaying xl_heap_visible records (see
>   */
>
> How many copies of this comment do you plan to end up with?

By the end, one for copy freeze replay and one for prune/freeze/vacuum
replay. I felt two wasn't too bad and was easier than meta-explaining
what the other comment was explaining.

> > > 0004. It is not clear to me why you need to get
> > > log_heap_prune_and_freeze to do the work here. Why can't
> > > log_newpage_buffer get the job done already?
> >
> > Well, I need something to emit the changes to the VM. I'm eliminating
> > all users of xl_heap_visible. Empty pages are the ones that benefit
> > the least from switching from xl_heap_visible -> xl_heap_prune. But,
> > if I don't transition them, we have to maintain all the
> > xl_heap_visible code (including visibilitymap_set() in its long form).
> >
> > As for log_newpage_buffer(), I could keep it if you think it is too
> > confusing to change log_heap_prune_and_freeze()'s API (by passing
> > force_heap_fpi) to handle this case, I can leave log_newpage_buffer()
> > there and then call log_heap_prune_and_freeze().
> >
> > I just thought it seemed simple to avoid emitting the new page record
> > and the VM update record, so why not -- but I don't have strong
> > feelings.
>
> Yeah, I'm not sure what the right thing to do here is. I think I was
> again experiencing brain fade by forgetting that there is a heap page
> and a VM page and, of course, log_heap_newpage() probably isn't going
> to touch the latter. So that makes sense. On the other hand, we could
> only have one type of WAL record for every single operation in the
> system if we gave it enough flags, and force_heap_fpi seems
> suspiciously like a flag that turns this into a whole different kind
> of WAL record.

I've kept log_heap_newpage() and used log_heap_prune_and_freeze() for
setting PD_ALL_VISIBLE and the VM.

> > > 0005. It looks a little curious that you delete the
> > > identify-corruption logic from the end of the if-nest and add it to
> > > the beginning. Ceteris paribus, you'd expect that to be worse, since
> > > corruption is a rare case.
> >
> > On master, the two corruption cases are sandwiched between the normal
> > VM set cases. And I actually think doing it this way is brittle. If
> > you put the cases which set the VM first, you have to have completely
> > bulletproof the if statements guarding them to foreclose any possible
> > corruption case from entering because otherwise you will overwrite the
> > corruption you then try to detect.
>
> Hmm. In the current code, we first test (!all_visible_according_to_vm
> && presult.all_visible), then (all_visible_according_to_vm &&
> !PageIsAllVisible(page) && visibilitymap_get_status(vacrel->rel,
> blkno, &vmbuffer) != 0), and then (presult.lpdead_items > 0 &&
> PageIsAllVisible(page)). The first and second can never coexist,
> because they require opposite values of all_visible_according_to_vm.
> The second and third cannot coexist because they require opposite
> values of PageIsAllVisible(page). It is not entirely obvious that the
> first and third tests couldn't both pass, but you'd have to have
> presult.all_visible and presult.lpdead_items > 0, and it's a bit hard
> to see how heap_page_prune_and_freeze() could ever allow that.
> Consider:
>
>     if (prstate.all_visible && prstate.lpdead_items == 0)
>     {
>         presult->all_visible = prstate.all_visible;
>         presult->all_frozen = prstate.all_frozen;
>     }
>     else
>     {
>         presult->all_visible = false;
>         presult->all_frozen = false;
>     }
> ...
>     presult->lpdead_items = prstate.lpdead_items;
>
> So I don't really think I'm persuaded that the current way is brittle.

I meant brittle because it has to be so carefully coded for it to work
out this way. If you ever wanted to change or enhance it, it's quite
hard to know how to make sure all of them are entirely mutually
exclusive.

> But that having been said, I agree with you that the order of the
> checks is kind of random, and I don't think it really matters that
> much for performance. What does matter is clarity. I feel like what
> I'd ideally like this logic to do is say: do we want the VM bit for
> the page to be set to all-frozen, just all-visible, or neither? Then
> push the VM bit to the correct state, dragging the page-level bit
> along behind. And the current logic sort of does that. It's roughly:
>
> 1. Should we go from not-all-visible to either all-visible or
> all-frozen? If yes, do so.
> 2. Should we go from either all-visible or all-frozen to
> not-all-visible? If yes, do so.
> 3. Should we go from either all-visible or all-frozen to
> not-all-visible for a different reason? If yes, do so.
> 4. Should we go from all-visible to all-frozen? If yes, do so.

I don't necessarily agree that fixing corruption and setting the VM
should be together -- they feel like separate things to me. But, I
don't feel strongly enough about it to push it.

> But what's weird is that all the tests are written differently, and we
> have two different reasons for going to not-all-visible, namely
> PD_ALL_VISIBLE-not-set and dead-items-on-page, whereas there's only
> one test for each of the other state-transitions, because the
> decision-making for those cases is fully completed at an earlier
> stage. I would kind of like to see this expressed in a way that first
> decides which state transition to make (forward-to-all-frozen,
> forward-to-all-visible, backward-to-all-visible,
> backward-to-not-all-visible, nothing) and then does the corresponding
> work. What you're doing instead is splitting half of those functions
> off into a helper function while keeping the other half where they are
> without cleaning up any of the logic. Now, maybe that's OK: I'm far
> from having grokked the whole patch set. But it is not any more clear
> than what we have now, IMHO, and perhaps even a bit less so.

In terms of my patch set, I do have to change something about this
mixture of fixing corruption and setting the VM because I need to set
the VM bits in the same critical section as making the other changes
to the heap page (pruning, etc) and include the VM set changes in the
same WAL record (note that clearing the VM to fix corruption is not
WAL-logged).

What I've gone with is determining what to set the VM bits to and then
fixing the corruption at the same time. Then, later, when making the
changes to the heap page, I actually set the VM. This is kind of the
opposite of what you suggested above -- determining what to set the
bits to altogether -- corruption and non-corruption cases together. I
don't think we can do that though, because fixing the corruption is
non WAL-logged changes to the page and VM and setting the VM bits is a
WAL-logged change. And, you can't clear bits with visibilitymap_set()
(there's an assertion about that). So you have to call different
functions (not to mention emit distinct error messages). I don't know
that I've come up with the ideal solution, though.

- Melanie


Attachments:

  [text/x-patch] v14-0003-Reorder-heap_page_prune_and_freeze-parameters.patch (6.2K, 2-v14-0003-Reorder-heap_page_prune_and_freeze-parameters.patch)
  download | inline diff:
From da4f0d141c8fa673a4651c42efd8bc48cd88c485 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 12:06:19 -0400
Subject: [PATCH v14 03/24] Reorder heap_page_prune_and_freeze parameters

Move read-only parameters to the beginning of the function, making it
more clear which paramters are inputs and which are input/outputs or
outputs. Also const-qualify VacuumCutoffs, which is not modified in
heap_page_prune_and_freeze().
---
 src/backend/access/heap/pruneheap.c  | 40 ++++++++++++++--------------
 src/backend/access/heap/vacuumlazy.c |  6 +++--
 src/include/access/heapam.h          |  6 ++---
 3 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d8ea0c78f77..28bd6a56749 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		freeze;
-	struct VacuumCutoffs *cutoffs;
+	const struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
 	 * Fields describing what to do to the page
@@ -260,8 +260,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+			heap_page_prune_and_freeze(relation, buffer, PRUNE_ON_ACCESS, 0, NULL,
+									   vistest, &presult, &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -303,7 +303,17 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
+ * reason indicates why the pruning is performed.  It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
+ *
+ * options:
+ *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ *   pruning.
+ *
+ *   FREEZE indicates that we will also freeze tuples, and will return
+ *   'all_visible', 'all_frozen' flags to the caller.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set, we will freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
  * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
@@ -313,29 +323,19 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
  * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
  *
+ * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+ * (see heap_prune_satisfies_vacuum).
+ *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -348,11 +348,11 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-						   GlobalVisState *vistest,
+						   PruneReason reason,
 						   int options,
-						   struct VacuumCutoffs *cutoffs,
+						   const struct VacuumCutoffs *cutoffs,
+						   GlobalVisState *vistest,
 						   PruneFreezeResult *presult,
-						   PruneReason reason,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 981d9380a92..ddc9677694c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1974,8 +1974,10 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+	heap_page_prune_and_freeze(rel, buf, PRUNE_VACUUM_SCAN, prune_options,
+							   &vacrel->cutoffs,
+							   vacrel->vistest,
+							   &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..34206a6a7d5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -374,11 +374,11 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-									   struct GlobalVisState *vistest,
+									   PruneReason reason,
 									   int options,
-									   struct VacuumCutoffs *cutoffs,
+									   const struct VacuumCutoffs *cutoffs,
+									   struct GlobalVisState *vistest,
 									   PruneFreezeResult *presult,
-									   PruneReason reason,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
-- 
2.43.0



  [text/x-patch] v14-0005-Rename-PruneState.freeze-to-attempt_freeze.patch (4.9K, 3-v14-0005-Rename-PruneState.freeze-to-attempt_freeze.patch)
  download | inline diff:
From 94b4e946cd498470e9a0fac0b15299feaccfeefc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v14 05/24] Rename PruneState.freeze to attempt_freeze

This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.

Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.

And rename local variable do_hint to do_hint_prune. This distinguishes
the prunable and page full hints used to decide whether or not to
on-access prune a page from other page-level and tuple hint bits.
---
 src/backend/access/heap/pruneheap.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ea8216e0632..740aa07cd83 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -42,7 +42,7 @@ typedef struct
 	/* whether or not dead items can be set LP_UNUSED during pruning */
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
-	bool		freeze;
+	bool		attempt_freeze;
 	const struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -361,14 +361,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	HeapTupleData tup;
 	bool		do_freeze;
 	bool		do_prune;
-	bool		do_hint;
-	bool		hint_bit_fpi;
+	bool		do_hint_prune;
+	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
 	prstate.cutoffs = cutoffs;
 
 	/*
@@ -390,7 +390,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/* initialize page freezing working state */
 	prstate.pagefrz.freeze_required = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
 		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -437,7 +437,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * function, when we return the value to the caller, so that the caller
 	 * doesn't set the VM bit incorrectly.
 	 */
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -551,7 +551,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
 	 * an FPI to be emitted.
 	 */
-	hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	/*
 	 * Process HOT chains.
@@ -659,7 +659,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * pd_prune_xid field or the page was marked full, we will update the hint
 	 * bit.
 	 */
-	do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
 	/*
@@ -667,7 +667,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * plans we prepared, or not.
 	 */
 	do_freeze = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (prstate.pagefrz.freeze_required)
 		{
@@ -702,14 +702,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				 */
 				if (RelationNeedsWAL(relation))
 				{
-					if (hint_bit_fpi)
+					if (did_tuple_hint_fpi)
 						do_freeze = true;
 					else if (do_prune)
 					{
 						if (XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
 					}
-					else if (do_hint)
+					else if (do_hint_prune)
 					{
 						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
@@ -752,7 +752,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
-	if (do_hint)
+	if (do_hint_prune)
 	{
 		/*
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
@@ -893,7 +893,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
 		{
@@ -1475,7 +1475,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 	}
 
 	/* Consider freezing any normal tuples which will not be removed */
-	if (prstate->freeze)
+	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
 
-- 
2.43.0



  [text/x-patch] v14-0002-Correct-prune-WAL-record-opcode-mention-in-comme.patch (1.3K, 4-v14-0002-Correct-prune-WAL-record-opcode-mention-in-comme.patch)
  download | inline diff:
From d89c39061d008ccfe306c9c39e7b74f9555a4ac2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 14:54:42 -0400
Subject: [PATCH v14 02/24] Correct prune WAL record opcode mention in comment

f83d709760d8 incorrectly refers to a XLOG_HEAP2_PRUNE_FREEZE WAL record
opcode. No such code exists. The relevant opcodes are
XLOG_HEAP2_PRUNE_ON_ACCESS, XLOG_HEAP2_PRUNE_VACUUM_SCAN, and
XLOG_HEAP2_PRUNE_VACUUM_CLEANUP. Correct it.
---
 src/backend/access/heap/pruneheap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7ebd22f00a3..d8ea0c78f77 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -794,7 +794,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		MarkBufferDirty(buffer);
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
 		 */
 		if (RelationNeedsWAL(relation))
 		{
@@ -2026,7 +2026,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 }
 
 /*
- * Write an XLOG_HEAP2_PRUNE_FREEZE WAL record
+ * Write an XLOG_HEAP2_PRUNE* WAL record
  *
  * This is used for several different page maintenance operations:
  *
-- 
2.43.0



  [text/x-patch] v14-0004-Keep-all_frozen-updated-in-heap_page_prune_and_f.patch (5.3K, 5-v14-0004-Keep-all_frozen-updated-in-heap_page_prune_and_f.patch)
  download | inline diff:
From 51729486db735989377d18bfc855d0d3d7f32114 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v14 04/24] Keep all_frozen updated in
 heap_page_prune_and_freeze

We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.

Future commits will separate usage of these fields, so it is best not to
rely on all_visible for all_frozen's validity.
---
 src/backend/access/heap/pruneheap.c  | 21 ++++++++++-----------
 src/backend/access/heap/vacuumlazy.c |  9 ++++-----
 2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 28bd6a56749..ea8216e0632 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -142,10 +142,6 @@ typedef struct
 	 * whether to freeze the page or not.  The all_visible and all_frozen
 	 * values returned to the caller are adjusted to include LP_DEAD items at
 	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -696,8 +692,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * used anymore.  The opportunistic freeze heuristic must be
 			 * improved; however, for now, try to approximate the old logic.
 			 */
-			if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
+			if (prstate.all_frozen && prstate.nfrozen > 0)
 			{
+				Assert(prstate.all_visible);
+
 				/*
 				 * Freezing would make the page all-frozen.  Have already
 				 * emitted an FPI or will do so anyway?
@@ -750,6 +748,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -819,7 +818,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 */
 			if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
+				if (prstate.all_frozen)
 					frz_conflict_horizon = prstate.visibility_cutoff_xid;
 				else
 				{
@@ -1382,7 +1381,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1404,7 +1403,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1417,7 +1416,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1436,7 +1435,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1454,7 +1453,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ddc9677694c..50cc898087f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2003,7 +2003,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * agreement with heap_page_is_all_visible() using an assertion.
 	 */
 #ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
 	if (presult.all_visible)
 	{
 		TransactionId debug_cutoff;
@@ -2056,6 +2055,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
 	Assert(!presult.all_visible || !(*has_lpdead_items));
+	Assert(!presult.all_frozen || presult.all_visible);
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
@@ -2161,11 +2161,10 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
+	 * it as all-frozen.
 	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	else if (all_visible_according_to_vm && presult.all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		uint8		old_vmbits;
 
-- 
2.43.0



  [text/x-patch] v14-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patch (12.1K, 6-v14-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patch)
  download | inline diff:
From cacff6c95e38d370b87148bc48cf6ac5f086ed07 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v14 01/24] Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE

Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting
bits in the VM, specify the changes to make to the VM block in the
XLOG_HEAP2_MULTI_INSERT record.

This halves the number of WAL records emitted by COPY FREEZE.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c        | 44 ++++++++++------
 src/backend/access/heap/heapam_xlog.c   | 54 +++++++++++++++++++-
 src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c  |  5 ++
 src/include/access/visibilitymap.h      |  2 +
 5 files changed, 154 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4c5ae205a7a..c8cd9d22726 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
 
 		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+		{
 			all_frozen_set = true;
+			/* Lock the vmbuffer before entering the critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		}
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
@@ -2506,7 +2510,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		 * going to add further frozen rows to it.
 		 *
 		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
+		 * page, mark it as all-frozen and update the visibility map. We're
+		 * already holding a pin on the vmbuffer.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2517,7 +2522,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
 		else if (all_frozen_set)
+		{
 			PageSetAllVisible(page);
+			visibilitymap_set_vmbits(relation,
+									 BufferGetBlockNumber(buffer),
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN);
+		}
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2637,26 +2658,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+			{
+				Assert(BufferIsDirty(vmbuffer));
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
+			}
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
-		 */
 		if (all_frozen_set)
-		{
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..faa7c561a8a 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -662,6 +663,57 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Now read and update the VM block.
+	 *
+	 * Note that the heap relation may have been dropped or truncated, leading
+	 * us to skip updating the heap block due to the LSN interlock. However,
+	 * even in that case, it's still safe to update the visibility map. Any
+	 * WAL record that clears the visibility map bit does so before checking
+	 * the page LSN, so any bits that need to be cleared will still be
+	 * cleared.
+	 *
+	 * Note that the lock on the heap page was dropped above. In normal
+	 * operation this would never be safe because a concurrent query could
+	 * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+	 * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+	 * the VM is set.
+	 *
+	 * In recovery, we expect no other writers, so writing to the VM page
+	 * without holding a lock on the heap page is considered safe enough. It
+	 * is done this way when replaying xl_heap_visible records (see
+	 * heap_xlog_visible()).
+	 */
+	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		visibilitymap_set_vmbits(reln, blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN);
+
+		/*
+		 * It is not possible that the VM was already set for this heap page,
+		 * so the vmbuffer must have been modified and marked dirty.
+		 */
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..aa48a436108 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
+ *		visibilitymap_set_vmbits - set bit(s) in a pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+		 flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+	/* Call in same critical section where WAL is emitted. */
+	Assert(InRecovery || CritSectionCount > 0);
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	Assert(BufferIsExclusiveLocked(vmBuf));
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..fc7056a91ea 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v14-0007-Update-PruneState.all_-visible-frozen-sooner-in-.patch (7.3K, 7-v14-0007-Update-PruneState.all_-visible-frozen-sooner-in-.patch)
  download | inline diff:
From de93f7eaffb009436cae2f80571ba0148f99db7a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 16:25:44 -0400
Subject: [PATCH v14 07/24] Update PruneState.all_[visible|frozen] sooner in
 pruning

We don't clear PruneState.all_visible and all_frozen during pruning when
we see LP_DEAD items because we want to still opportunistically freeze a
page if it would become frozen after vacuum's third phase.

Currently, this is fine because heap_page_prune_and_freeze() doesn't set
PD_ALL_VISIBLE or set bits in the VM. If we want to do that in the
future, we need all_visible and all_frozen to be accurate earlier in
heap_page_prune_and_freeze(). To do this, we must also move up
determination of the freeze conflict horizon. We use the visibility
cutoff xid even if the whole page won't be frozen until after vacuum's
third phase.
---
 src/backend/access/heap/pruneheap.c | 95 ++++++++++++++---------------
 1 file changed, 45 insertions(+), 50 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4ed74de6f27..5e536bd0d4d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -296,7 +296,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * pre-freeze checks.
  *
  * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
- * been decided before calling this function.
+ * been decided before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to freeze
+ * tuples.
  *
  * prstate is an input/output parameter.
  *
@@ -308,7 +310,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 					  bool did_tuple_hint_fpi,
 					  bool do_prune,
 					  bool do_hint_prune,
-					  PruneState *prstate)
+					  PruneState *prstate,
+					  TransactionId *frz_conflict_horizon)
 {
 	bool		do_freeze = false;
 
@@ -378,6 +381,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * critical section.
 		 */
 		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+		/*
+		 * Calculate what the snapshot conflict horizon should be for a record
+		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+		 * for conflicts when the whole page is eligible to become all-frozen
+		 * in the VM once we're done with it.  Otherwise we generate a
+		 * conservative cutoff by stepping back from OldestXmin.
+		 */
+		if (prstate->all_frozen)
+			*frz_conflict_horizon = prstate->visibility_cutoff_xid;
+		else
+		{
+			/* Avoids false conflicts when hot_standby_feedback in use */
+			*frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+			TransactionIdRetreat(*frz_conflict_horizon);
+		}
 	}
 	else if (prstate->nfrozen > 0)
 	{
@@ -478,6 +497,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_hint_prune;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId frz_conflict_horizon = InvalidTransactionId;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
@@ -546,10 +566,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * are tuples present that are not visible to everyone or if there are
 	 * dead tuples which are not yet removable.  However, dead tuples which
 	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * opportunistically freezing.  Because of that, we do not immediately
+	 * clear all_visible when we see LP_DEAD items.  We fix that after
+	 * scanning the line pointers, before we return the value to the caller,
+	 * so that the caller doesn't set the VM bit incorrectly.
 	 */
 	if (prstate.attempt_freeze)
 	{
@@ -784,8 +804,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 									  did_tuple_hint_fpi,
 									  do_prune,
 									  do_hint_prune,
-									  &prstate);
+									  &prstate,
+									  &frz_conflict_horizon);
 
+	/*
+	 * While scanning the line pointers, we did not clear
+	 * all_visible/all_frozen when encountering LP_DEAD items because we
+	 * wanted the decision whether or not to freeze the page to be unaffected
+	 * by the short-term presence of LP_DEAD items.  These LP_DEAD items are
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that we finished determining whether or not to freeze the page,
+	 * update all_visible and all_frozen so that they reflect the true state
+	 * of the page for setting PD_ALL_VISIBLE and VM bits.
+	 */
+	if (prstate.lpdead_items > 0)
+		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
@@ -846,27 +882,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
 			TransactionId conflict_xid;
 
-			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
-			 */
-			if (do_freeze)
-			{
-				if (prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
-			}
-
 			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
 				conflict_xid = frz_conflict_horizon;
 			else
@@ -890,30 +907,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
+	presult->all_visible = prstate.all_visible;
+	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
 	/*
-- 
2.43.0



  [text/x-patch] v14-0006-Add-helper-for-freeze-determination-to-heap_page.patch (7.0K, 8-v14-0006-Add-helper-for-freeze-determination-to-heap_page.patch)
  download | inline diff:
From aee92ee8a07beade81a82200fbbfe605d499ac4c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 14:22:10 -0400
Subject: [PATCH v14 06/24] Add helper for freeze determination to
 heap_page_prune_and_freeze

After scanning through the line pointers on the heap page during
vacuum's first phase, we use several statuses and information we
collected to determine whether or not we will use the freeze plans we
assembled.

Do this in a helper for better readability.
---
 src/backend/access/heap/pruneheap.c | 199 +++++++++++++++++-----------
 1 file changed, 119 insertions(+), 80 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 740aa07cd83..4ed74de6f27 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -289,6 +289,120 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans
+ * we prepared for the given heap buffer or not. If the caller specified we
+ * should not freeze tuples, it exits early. Otherwise, it does a few
+ * pre-freeze checks.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool did_tuple_hint_fpi,
+					  bool do_prune,
+					  bool do_hint_prune,
+					  PruneState *prstate)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and exit.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			Assert(prstate->all_visible);
+
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_prune)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	return do_freeze;
+}
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -666,87 +780,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
 	 */
-	do_freeze = false;
-	if (prstate.attempt_freeze)
-	{
-		if (prstate.pagefrz.freeze_required)
-		{
-			/*
-			 * heap_prepare_freeze_tuple indicated that at least one XID/MXID
-			 * from before FreezeLimit/MultiXactCutoff is present.  Must
-			 * freeze to advance relfrozenxid/relminmxid.
-			 */
-			do_freeze = true;
-		}
-		else
-		{
-			/*
-			 * Opportunistically freeze the page if we are generating an FPI
-			 * anyway and if doing so means that we can set the page
-			 * all-frozen afterwards (might not happen until VACUUM's final
-			 * heap pass).
-			 *
-			 * XXX: Previously, we knew if pruning emitted an FPI by checking
-			 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze
-			 * and prune records were combined, this heuristic couldn't be
-			 * used anymore.  The opportunistic freeze heuristic must be
-			 * improved; however, for now, try to approximate the old logic.
-			 */
-			if (prstate.all_frozen && prstate.nfrozen > 0)
-			{
-				Assert(prstate.all_visible);
+	do_freeze = heap_page_will_freeze(relation, buffer,
+									  did_tuple_hint_fpi,
+									  do_prune,
+									  do_hint_prune,
+									  &prstate);
 
-				/*
-				 * Freezing would make the page all-frozen.  Have already
-				 * emitted an FPI or will do so anyway?
-				 */
-				if (RelationNeedsWAL(relation))
-				{
-					if (did_tuple_hint_fpi)
-						do_freeze = true;
-					else if (do_prune)
-					{
-						if (XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-					else if (do_hint_prune)
-					{
-						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
-	}
-	else if (prstate.nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate.pagefrz.freeze_required);
-
-		prstate.all_frozen = false;
-		prstate.nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
-- 
2.43.0



  [text/x-patch] v14-0008-Set-PD_ALL_VISIBLE-in-heap_page_prune_and_freeze.patch (16.1K, 9-v14-0008-Set-PD_ALL_VISIBLE-in-heap_page_prune_and_freeze.patch)
  download | inline diff:
From 7ae7f9d9f1c05cf66d7fee964db801cbcf52a324 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 16:32:35 -0400
Subject: [PATCH v14 08/24] Set PD_ALL_VISIBLE in heap_page_prune_and_freeze

After phase I of vacuum, if the heap page was rendered all-visible, we
can set it as such in the VM. We also must set the page-level
PD_ALL_VISIBLE bit. By setting PD_ALL_VISIBLE while making the other
changes to the heap page instead of while updating the VM, we can omit
the heap page from the WAL chain during the VM update. The result is
that xl_heap_prune records include updates to PD_ALL_VISIBLE.

This commit doesn't yet remove the heap page from the WAL chain because
it does not change other users of visibilitymap_set().

Note that this is carefully coded such that if the only modification to
the page during heap_page_prune_and_freeze() is setting PD_ALL_VISIBLE
and checksums/wal_log_hints are disabled we will never emit a full page
image of the heap page.

This also fixes a longstanding issue where, when checksums/wal_log_hints
are enabled, an all-visible page being set all-frozen may not mark the
buffer dirty before visibilitymap_set() stamps it with the
xl_heap_visible LSN.

It is noteworthy that the checks for page corruption and an inconsistent
state between the heap page and the VM in lazy_scan_prune() now happen
after having set PD_ALL_VISIBLE. That is not a functional change because
the corruption cases are mutually exclusive with cases where we would
set PD_ALL_VISIBLE.
---
 src/backend/access/heap/heapam_xlog.c | 63 +++++++++++++++++++----
 src/backend/access/heap/pruneheap.c   | 72 ++++++++++++++++++++++++---
 src/backend/access/heap/vacuumlazy.c  | 29 +----------
 src/include/access/heapam.h           |  2 +
 src/include/access/heapam_xlog.h      |  2 +
 5 files changed, 125 insertions(+), 43 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index faa7c561a8a..a54238f2b59 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -90,6 +90,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
+		bool		do_prune;
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
 											   &nplans, &plans, &frz_offsets,
@@ -97,11 +98,13 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,17 +141,52 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		/*
+		 * The critical integrity requirement here is that we must never end
+		 * up with a situation where the visibility map bit is set, and the
+		 * page-level PD_ALL_VISIBLE bit is clear.  If that were to occur,
+		 * then a subsequent page modification would fail to clear the
+		 * visibility map bit.
+		 */
+		if (xlrec.flags & XLHP_SET_PD_ALL_VIS)
+			PageSetAllVisible(page);
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
-
-		PageSetLSN(page, lsn);
 		MarkBufferDirty(buffer);
+
+		/*
+		 * We always emit a WAL record when setting PD_ALL_VISIBLE, but we are
+		 * careful not to emit a full page image unless
+		 * checksums/wal_log_hints are enabled. We only set the heap page LSN
+		 * if full page images were an option when emitting WAL. Otherwise,
+		 * subsequent modifications of the page may incorrectly skip emitting
+		 * a full page image.
+		 */
+		if (do_prune || nplans > 0 ||
+			(xlrec.flags & XLHP_SET_PD_ALL_VIS && XLogHintBitIsNeeded()))
+			PageSetLSN(page, lsn);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or set PD_ALL_VISIBLE update
+	 * the freespace map.
+	 *
+	 * Even if we are just setting PD_ALL_VISIBLE (and thus not freeing up any
+	 * space), we'll still update the FSM for this page. Since the FSM is not
+	 * WAL-logged and only updated heuristically, it easily becomes stale in
+	 * standbys.  If the standby is later promoted and runs VACUUM, it will
+	 * skip updating individual free space figures for pages that became
+	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
+	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+	 * space values to upper FSM layers; later inserters try to use such pages
+	 * only to find out that they are unusable.  This can cause long stalls
+	 * when there are many such pages.
+	 *
+	 * Forestall those problems by updating FSM's idea about a page that is
+	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
@@ -157,10 +195,16 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	{
 		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
 						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
+						   XLHP_HAS_NOW_UNUSED_ITEMS |
+						   XLHP_SET_PD_ALL_VIS))
 		{
 			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
 
+			/*
+			 * We want to avoid holding an exclusive lock on the heap buffer
+			 * while doing IO, so we'll release the lock on the heap buffer
+			 * first.
+			 */
 			UnlockReleaseBuffer(buffer);
 
 			XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
@@ -173,10 +217,11 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 /*
  * Replay XLOG_HEAP2_VISIBLE records.
  *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
+ * It is imperative that the previously emitted record set PD_ALL_VISIBLE on
+ * the heap page. We must never end up with a situation where the visibility
+ * map bit is set, and the page-level PD_ALL_VISIBLE bit is clear.  If that
+ * were to occur, then a subsequent page modification would fail to clear the
+ * visibility map bit.
  */
 static void
 heap_xlog_visible(XLogReaderState *record)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5e536bd0d4d..9b25131543b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -495,6 +495,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	TransactionId frz_conflict_horizon = InvalidTransactionId;
@@ -824,6 +825,22 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+
+	/*
+	 * Though callers should set the VM if PD_ALL_VISIBLE is set here, it is
+	 * allowed for the page-level bit to be set and the VM to be clear.
+	 * Setting PD_ALL_VISIBLE when we are making the changes to the page that
+	 * render it all-visible allows us to omit the heap page from the WAL
+	 * chain when later updating the VM -- even when checksums/wal_log_hints
+	 * are enabled.
+	 */
+	do_set_pd_vis = false;
+	if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
+	{
+		if (prstate.all_visible && !PageIsAllVisible(page))
+			do_set_pd_vis = true;
+	}
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -844,14 +861,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_pd_vis)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_pd_vis)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -865,6 +885,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
+		if (do_set_pd_vis)
+			PageSetAllVisible(page);
+
 		MarkBufferDirty(buffer);
 
 		/*
@@ -891,7 +914,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 			log_heap_prune_and_freeze(relation, buffer,
 									  conflict_xid,
-									  true, reason,
+									  true,
+									  do_set_pd_vis,
+									  reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -2078,6 +2103,10 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
@@ -2086,6 +2115,7 @@ void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
+						  bool set_pd_all_vis,
 						  PruneReason reason,
 						  HeapTupleFreeze *frozen, int nfrozen,
 						  OffsetNumber *redirected, int nredirected,
@@ -2095,6 +2125,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2103,8 +2134,21 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
 	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 * Note that if we explicitly skip an FPI, we must not set the heap page
+	 * LSN later.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2112,7 +2156,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2169,6 +2213,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Prepare the main xl_heap_prune record.  We already set the XLHP_HAS_*
 	 * flag above.
 	 */
+	if (set_pd_all_vis)
+		xlrec.flags |= XLHP_SET_PD_ALL_VIS;
 	if (RelationIsAccessibleInLogicalDecoding(relation))
 		xlrec.flags |= XLHP_IS_CATALOG_REL;
 	if (TransactionIdIsValid(conflict_xid))
@@ -2201,5 +2247,17 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	/*
+	 * We must bump the page LSN if pruning or freezing. If we are only
+	 * updating PD_ALL_VISIBLE, though, we can skip doing this unless
+	 * wal_log_hints/checksums are enabled. Torn pages are possible if we
+	 * update PD_ALL_VISIBLE without bumping the LSN, but this is deemed okay
+	 * for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+	{
+		Assert(BufferIsDirty(buffer));
+		PageSetLSN(BufferGetPage(buffer), recptr);
+	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 50cc898087f..308abff16ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1970,7 +1970,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS;
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
@@ -2073,21 +2073,6 @@ lazy_scan_prune(LVRelState *vacrel,
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
@@ -2168,17 +2153,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		uint8		old_vmbits;
 
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
 		/*
 		 * Set the page all-frozen (and all-visible) in the VM.
 		 *
@@ -2891,6 +2865,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
 								  InvalidTransactionId,
 								  false,	/* no cleanup lock required */
+								  false,
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 34206a6a7d5..2f77d8dbcd6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
@@ -390,6 +391,7 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
+									  bool set_pd_all_vis,
 									  PruneReason reason,
 									  HeapTupleFreeze *frozen, int nfrozen,
 									  OffsetNumber *redirected, int nredirected,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..7d3fb75dda7 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -294,6 +294,8 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
 
+#define		XLHP_SET_PD_ALL_VIS			(1 << 0)
+
 /* to handle recovery conflict during logical decoding on standby */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
-- 
2.43.0



  [text/x-patch] v14-0009-Combine-vacuum-phase-I-VM-update-cases.patch (4.4K, 10-v14-0009-Combine-vacuum-phase-I-VM-update-cases.patch)
  download | inline diff:
From a88a7f88097755d430d030753c4080aa4092ef7b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 17:48:38 -0400
Subject: [PATCH v14 09/24] Combine vacuum phase I VM update cases

We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.

Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.
---
 src/backend/access/heap/vacuumlazy.c | 68 ++++++++--------------------
 1 file changed, 18 insertions(+), 50 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 308abff16ca..5a6bbbd97f2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2058,15 +2058,22 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_frozen || presult.all_visible);
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
+	 * Handle setting visibility map bits based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * all_frozen variables.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.all_visible && !all_visible_according_to_vm) ||
+		(presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer)))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
+		/*
+		 * If the page is all-frozen, we can pass InvalidTransactionId as our
+		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+		 * everything safe for REDO was logged when the page's tuples were
+		 * frozen.
+		 */
 		if (presult.all_frozen)
 		{
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2079,6 +2086,12 @@ lazy_scan_prune(LVRelState *vacrel,
 									   flags);
 
 		/*
+		 * Even if we are only setting the all-frozen bit, there is a small
+		 * chance that the VM was modified sometime between setting
+		 * all_visible_according_to_vm and checking the visibility during
+		 * pruning. Check the return value of old_vmbits to ensure the
+		 * visibility map counters used for logging are accurate.
+		 *
 		 * If the page wasn't already set all-visible and/or all-frozen in the
 		 * VM, count it as newly set for logging.
 		 */
@@ -2100,6 +2113,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/*
+	 * Now handle two potential corruption cases:
+	 *
 	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
 	 * page-level bit is clear.  However, it's possible that the bit got
 	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
@@ -2144,53 +2159,6 @@ lazy_scan_prune(LVRelState *vacrel,
 							VISIBILITYMAP_VALID_BITS);
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
 
 	return presult.ndeleted;
 }
-- 
2.43.0



  [text/x-patch] v14-0010-Vacuum-phase-III-set-PD_ALL_VISIBLE-in-vacuum-WA.patch (9.2K, 11-v14-0010-Vacuum-phase-III-set-PD_ALL_VISIBLE-in-vacuum-WA.patch)
  download | inline diff:
From aafd0b18341a03d4b48574f28694d04891555c5e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 10:39:31 -0400
Subject: [PATCH v14 10/24] Vacuum phase III set PD_ALL_VISIBLE in vacuum WAL
 record

Instead of setting PD_ALL_VISIBLE on the heap page when setting bits in
the VM, set it when flipping the line pointers on the page to LP_UNUSED.
This will allow us to omit the heap page from the VM WAL chain.

To do this, we must check if the page will be all-visible once we flip
the line pointers before we actually do so.

One functional change is that a single critical section surrounds both
the VM update and the heap update. Previously they were each in a
critical section, so we could crash and have set PD_ALL_VISIBLE but not
set bits in the VM.
---
 src/backend/access/heap/vacuumlazy.c | 140 ++++++++++++++++++++-------
 1 file changed, 105 insertions(+), 35 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5a6bbbd97f2..9bfcd67a61b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,6 +465,11 @@ static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2793,6 +2798,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	TransactionId visibility_cutoff_xid;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2803,6 +2809,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	if (heap_page_would_be_all_visible(vacrel, buffer,
+									   deadoffsets, num_offsets,
+									   &all_frozen, &visibility_cutoff_xid))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2822,6 +2840,13 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	/*
+	 * The page will never have PD_ALL_VISIBLE already set, so if we are
+	 * setting the VM, we must set PD_ALL_VISIBLE as well.
+	 */
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+		PageSetAllVisible(page);
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2833,7 +2858,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
 								  InvalidTransactionId,
 								  false,	/* no cleanup lock required */
-								  false,
+								  (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
@@ -2842,36 +2867,26 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	}
 
 	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
+	 * Note that we don't end the critical section until after emitting the VM
+	 * record. This ensures both PD_ALL_VISIBLE and the VM bits are set or
+	 * unset in the event of a crash. While it is correct for PD_ALL_VISIBLE
+	 * to be set and the VM to be clear, we should do our best to keep these
+	 * in sync. This does mean that we will take a lock on the VM buffer
+	 * inside of a critical section, which is generally discouraged. There is
+	 * precedent for this in other callers of visibilitymap_set(), though.
 	 */
-	END_CRIT_SECTION();
 
 	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
+	 * Now that we have removed the LP_DEAD items from the page, set the
+	 * visibility map if the page became all-visible/all-frozen. Changes to
+	 * the heap page have already been logged.
 	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		PageSetAllVisible(page);
 		visibilitymap_set(vacrel->rel, blkno, buffer,
 						  InvalidXLogRecPtr,
 						  vmbuffer, visibility_cutoff_xid,
-						  flags);
+						  vmflags);
 
 		/* Count the newly set VM page for logging */
 		vacrel->vm_new_visible_pages++;
@@ -2879,6 +2894,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 			vacrel->vm_new_visible_frozen_pages++;
 	}
 
+	END_CRIT_SECTION();
+
 	/* Revert to the previous phase information for error traceback */
 	restore_vacuum_error_info(vacrel, &saved_err_info);
 }
@@ -3540,30 +3557,77 @@ dead_items_cleanup(LVRelState *vacrel)
 }
 
 /*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
- *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
  */
 static bool
 heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 						 TransactionId *visibility_cutoff_xid,
 						 bool *all_frozen)
 {
+
+	return heap_page_would_be_all_visible(vacrel, buf,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid);
+}
+
+/*
+ * Determines whether or not the heap page in buf is all-visible other than
+ * the dead line pointers referred to by the provided deadoffsets array.
+ *
+ * deadoffsets are the offsets the caller knows about and already removed
+ * associated index entries. Vacuum will call this before setting those line
+ * pointers LP_UNUSED. So, if there are no new LP_DEAD items, then the page
+ * can be set all-visible in the VM by the caller.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
+ *
+ * vacrel->cutoffs.OldestXmin is used to determine visibility.
+ *
+ * *all_frozen is an output parameter indicating to the caller if every tuple
+ * on the page is frozen.
+ *
+ * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
+ * visible tuples. It is only valid if the page is all-visible.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
+ */
+static bool
+heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+							   OffsetNumber *deadoffsets,
+							   int ndeadoffsets,
+							   bool *all_frozen,
+							   TransactionId *visibility_cutoff_xid)
+{
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	int			matched_dead_count = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
 
+	Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+	/* Confirm input deadoffsets[] is strictly sorted */
+	if (ndeadoffsets > 1)
+	{
+		for (int i = 1; i < ndeadoffsets; i++)
+			Assert(deadoffsets[i - 1] < deadoffsets[i]);
+	}
+#endif
+
 	maxoff = PageGetMaxOffsetNumber(page);
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff && all_visible;
@@ -3591,9 +3655,15 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			if (!deadoffsets ||
+				matched_dead_count >= ndeadoffsets ||
+				deadoffsets[matched_dead_count] != offnum)
+			{
+				*all_frozen = all_visible = false;
+				break;
+			}
+			matched_dead_count++;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
-- 
2.43.0



  [text/x-patch] v14-0011-Log-setting-empty-pages-PD_ALL_VISIBLE-with-XLOG.patch (3.0K, 12-v14-0011-Log-setting-empty-pages-PD_ALL_VISIBLE-with-XLOG.patch)
  download | inline diff:
From d774b80288042d9a31cbc6477c2f0151f1c9dc2e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 18:11:49 -0400
Subject: [PATCH v14 11/24] Log setting empty pages PD_ALL_VISIBLE with
 XLOG_HEAP2_VACUUM_SCAN

Though not a big win for this particular case, if we use the
XLOG_HEAP2_VACUUM_SCAN record to log setting PD_ALL_VISIBLE on the heap
page we can omit the heap page from the WAL chain when setting the
visibility map. A follow-on commit will actually remove the heap page
from the VM set WAL chain.
---
 src/backend/access/heap/vacuumlazy.c | 43 +++++++++++++++++++---------
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9bfcd67a61b..c016f8f7c25 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1879,23 +1879,38 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		{
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
-			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				PageGetLSN(page) == InvalidXLogRecPtr)
-				log_newpage_buffer(buf, true);
+			if (RelationNeedsWAL(vacrel->rel))
+			{
+				/*
+				 * It's possible that another backend has extended the heap,
+				 * initialized the page, and then failed to WAL-log the page
+				 * due to an ERROR.  Since heap extension is not WAL-logged,
+				 * recovery might try to replay our record setting the page
+				 * all-visible and find that the page isn't initialized, which
+				 * will cause a PANIC. To prevent that, check whether the page
+				 * has been previously WAL-logged, and if not, do that now.
+				 *
+				 * Otherwise, just emit WAL for setting PD_ALL_VISIBLE on the
+				 * heap page. Doing this in a separate record from setting the
+				 * VM allows us to omit the heap page from the VM WAL chain.
+				 */
+				if (PageGetLSN(page) == InvalidXLogRecPtr)
+					log_newpage_buffer(buf, true);
+				else
+					log_heap_prune_and_freeze(vacrel->rel, buf,
+											  InvalidTransactionId, /* conflict xid */
+											  false,	/* cleanup lock */
+											  true, /* set_pd_all_vis */
+											  PRUNE_VACUUM_SCAN,	/* reason */
+											  NULL, 0,
+											  NULL, 0,
+											  NULL, 0,
+											  NULL, 0);
+			}
 
-			PageSetAllVisible(page);
 			visibilitymap_set(vacrel->rel, blkno, buf,
 							  InvalidXLogRecPtr,
 							  vmbuffer, InvalidTransactionId,
-- 
2.43.0



  [text/x-patch] v14-0012-Remove-heap-buffer-from-XLOG_HEAP2_VISIBLE-WAL-c.patch (11.8K, 13-v14-0012-Remove-heap-buffer-from-XLOG_HEAP2_VISIBLE-WAL-c.patch)
  download | inline diff:
From a63eed81ff73217a12cbb84b2a7f4def3366871a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 11:05:30 -0400
Subject: [PATCH v14 12/24] Remove heap buffer from XLOG_HEAP2_VISIBLE WAL
 chain

Now that all users of visibilitymap_set() include setting PD_ALL_VISIBLE
in the WAL record capturing other changes to the heap page, we no longer
need to include the heap buffer in the WAL chain for setting the VM.
---
 src/backend/access/heap/heapam.c        | 16 +-----
 src/backend/access/heap/heapam_xlog.c   | 76 +++----------------------
 src/backend/access/heap/vacuumlazy.c    |  6 +-
 src/backend/access/heap/visibilitymap.c | 31 +---------
 src/include/access/heapam_xlog.h        |  3 +-
 src/include/access/visibilitymap.h      |  2 +-
 6 files changed, 16 insertions(+), 118 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c8cd9d22726..0323e2df409 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8807,21 +8807,14 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
  *
  * snapshotConflictHorizon comes from the largest xmin on the page being
  * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
  */
 XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer vm_buffer,
 				 TransactionId snapshotConflictHorizon, uint8 vmflags)
 {
 	xl_heap_visible xlrec;
 	XLogRecPtr	recptr;
-	uint8		flags;
 
-	Assert(BufferIsValid(heap_buffer));
 	Assert(BufferIsValid(vm_buffer));
 
 	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
@@ -8830,14 +8823,7 @@ log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
 		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
 	XLogBeginInsert();
 	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
 	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
 
 	return recptr;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a54238f2b59..68b41f39e69 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -229,15 +229,12 @@ heap_xlog_visible(XLogReaderState *record)
 	XLogRecPtr	lsn = record->EndRecPtr;
 	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
 	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
 
 	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
 
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
+	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 
 	/*
 	 * If there are any Hot Standby transactions running that have an xmin
@@ -254,70 +251,11 @@ heap_xlog_visible(XLogReaderState *record)
 											rlocator);
 
 	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
+	 * Even if the heap relation was dropped or truncated and the previously
+	 * emitted record skipped the heap page update due to this LSN interlock,
+	 * it's still safe to update the visibility map.  Any WAL record that
+	 * clears the visibility map bit does so before checking the page LSN, so
+	 * any bits that need to be cleared will still be cleared.
 	 */
 	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
 									  &vmbuffer) == BLK_NEEDS_REDO)
@@ -341,7 +279,7 @@ heap_xlog_visible(XLogReaderState *record)
 
 		reln = CreateFakeRelcacheEntry(rlocator);
 
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
+		visibilitymap_set(reln, blkno, lsn, vmbuffer,
 						  xlrec->snapshotConflictHorizon, vmbits);
 
 		ReleaseBuffer(vmbuffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c016f8f7c25..735f1e7501e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1911,7 +1911,7 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 											  NULL, 0);
 			}
 
-			visibilitymap_set(vacrel->rel, blkno, buf,
+			visibilitymap_set(vacrel->rel, blkno,
 							  InvalidXLogRecPtr,
 							  vmbuffer, InvalidTransactionId,
 							  VISIBILITYMAP_ALL_VISIBLE |
@@ -2100,7 +2100,7 @@ lazy_scan_prune(LVRelState *vacrel,
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
+		old_vmbits = visibilitymap_set(vacrel->rel, blkno,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
 									   flags);
@@ -2898,7 +2898,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 */
 	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		visibilitymap_set(vacrel->rel, blkno, buffer,
+		visibilitymap_set(vacrel->rel, blkno,
 						  InvalidXLogRecPtr,
 						  vmbuffer, visibility_cutoff_xid,
 						  vmflags);
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index aa48a436108..75fcb3f067a 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -233,9 +233,7 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
  * when a page that is already all-visible is being marked all-frozen.
  *
  * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
+ * this function.
  *
  * You must pass a buffer containing the correct map page to this function.
  * Call visibilitymap_pin first to pin the right one. This function doesn't do
@@ -244,7 +242,7 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
  * Returns the state of the page's VM bits before setting flags.
  */
 uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
 				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
 				  uint8 flags)
 {
@@ -261,18 +259,11 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 #endif
 
 	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
 	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
 
 	/* Must never set all_frozen bit without also setting all_visible bit */
 	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
 
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
 	/* Check that we have the right VM page pinned */
 	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
 		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
@@ -294,23 +285,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 			if (XLogRecPtrIsInvalid(recptr))
 			{
 				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
+				recptr = log_heap_visible(rel, vmBuf, cutoff_xid, flags);
 			}
 			PageSetLSN(page, recptr);
 		}
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 7d3fb75dda7..82b8f7f2bbc 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -440,7 +440,6 @@ typedef struct xl_heap_inplace
  * This is what we need to know about setting a visibility map bit
  *
  * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
  */
 typedef struct xl_heap_visible
 {
@@ -493,7 +492,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel,
 								   Buffer vm_buffer,
 								   TransactionId snapshotConflictHorizon,
 								   uint8 vmflags);
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index fc7056a91ea..302adf4856a 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -32,7 +32,7 @@ extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
 extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
+							   BlockNumber heapBlk,
 							   XLogRecPtr recptr,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
-- 
2.43.0



  [text/x-patch] v14-0014-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (21.5K, 14-v14-0014-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 1fc1a338e5d6621f89df46fe29d08c799267b39d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 15:52:18 -0400
Subject: [PATCH v14 14/24] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase III

Instead of emitting a separate XLOG_HEAP2_VISIBLE record for each page
that is rendered all-visible by vacuum's third phase, include the
updates to the VM in the already emitted XLOG_HEAP2_PRUNE_VACUUM_CLEANUP
record.

The visibilitymap bits are stored in the flags member of the
xl_heap_prune struct.

This can decrease the number of of WAL records vacuum phase III emits by
as much as half.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c  | 147 ++++++++++++++++++-------
 src/backend/access/heap/pruneheap.c    |  37 ++++++-
 src/backend/access/heap/vacuumlazy.c   |  38 +++----
 src/backend/access/rmgrdesc/heapdesc.c |  11 +-
 src/include/access/heapam.h            |   1 +
 src/include/access/heapam_xlog.h       |  25 ++++-
 6 files changed, 190 insertions(+), 69 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 68b41f39e69..c1f332f7a9a 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
+	Size		freespace = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,22 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
+	if (xlrec.flags & XLHP_VM_ALL_VISIBLE)
+	{
+		vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (xlrec.flags & XLHP_VM_ALL_FROZEN)
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+	}
+
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. That means we
+	 * can't replay this record if it removes tuples that are still visible to
+	 * transactions on the standby, freeze tuples with xids that are still
+	 * considered running on the standby, or set a page as all-visible in the
+	 * VM if it isn't all-visible to all transactions on the standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	}
 
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -100,6 +113,11 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 
 		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 ||
+			   vmflags & VISIBILITYMAP_VALID_BITS ||
+			   xlrec.flags & XLHP_SET_PD_ALL_VIS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
@@ -147,15 +165,23 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		 * page-level PD_ALL_VISIBLE bit is clear.  If that were to occur,
 		 * then a subsequent page modification would fail to clear the
 		 * visibility map bit.
+		 *
+		 * Note: we don't worry about updating the page's prunability hints.
+		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
 		if (xlrec.flags & XLHP_SET_PD_ALL_VIS)
 			PageSetAllVisible(page);
 
 		/*
-		 * Note: we don't worry about updating the page's prunability hints.
-		 * At worst this will cause an extra prune cycle to occur soon.
+		 * We must never end up with the VM bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+		 * modification would fail to clear the VM bit.
 		 */
-		MarkBufferDirty(buffer);
+		Assert(!(vmflags & VISIBILITYMAP_VALID_BITS) || PageIsAllVisible(page));
+
+		/* If this record only sets the VM, no need to dirty the heap page */
+		if (do_prune || nplans > 0 || xlrec.flags & XLHP_SET_PD_ALL_VIS)
+			MarkBufferDirty(buffer);
 
 		/*
 		 * We always emit a WAL record when setting PD_ALL_VISIBLE, but we are
@@ -171,47 +197,94 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	}
 
 	/*
-	 * If we released any space or line pointers or set PD_ALL_VISIBLE update
-	 * the freespace map.
+	 * If we released any space or line pointers or set PD_ALL_VISIBLE or the
+	 * VM, update the freespace map.
 	 *
-	 * Even if we are just setting PD_ALL_VISIBLE (and thus not freeing up any
-	 * space), we'll still update the FSM for this page. Since the FSM is not
-	 * WAL-logged and only updated heuristically, it easily becomes stale in
-	 * standbys.  If the standby is later promoted and runs VACUUM, it will
-	 * skip updating individual free space figures for pages that became
-	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
-	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
-	 * space values to upper FSM layers; later inserters try to use such pages
-	 * only to find out that they are unusable.  This can cause long stalls
-	 * when there are many such pages.
+	 * Even if we are just setting PD_ALL_VISIBLE or updating the VM (and thus
+	 * not freeing up any space), we'll still update the FSM for this page.
+	 * Since the FSM is not WAL-logged and only updated heuristically, it
+	 * easily becomes stale in standbys.  If the standby is later promoted and
+	 * runs VACUUM, it will skip updating individual free space figures for
+	 * pages that became all-visible (or all-frozen, depending on the vacuum
+	 * mode,) which is troublesome when FreeSpaceMapVacuum propagates too
+	 * optimistic free space values to upper FSM layers; later inserters try
+	 * to use such pages only to find out that they are unusable.  This can
+	 * cause long stalls when there are many such pages.
 	 *
 	 * Forestall those problems by updating FSM's idea about a page that is
 	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
+	 *
+	 * We want to avoid holding an exclusive lock on the heap buffer while
+	 * doing IO (either of the FSM or the VM), so we'll release the lock on
+	 * the heap buffer before doing either.
 	 */
 	if (BufferIsValid(buffer))
 	{
 		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
 						   XLHP_HAS_DEAD_ITEMS |
 						   XLHP_HAS_NOW_UNUSED_ITEMS |
-						   XLHP_SET_PD_ALL_VIS))
-		{
-			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+						   XLHP_SET_PD_ALL_VIS |
+						   (vmflags & VISIBILITYMAP_VALID_BITS)))
+			freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
 
-			/*
-			 * We want to avoid holding an exclusive lock on the heap buffer
-			 * while doing IO, so we'll release the lock on the heap buffer
-			 * first.
-			 */
-			UnlockReleaseBuffer(buffer);
+		UnlockReleaseBuffer(buffer);
+	}
 
-			XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+	/*
+	 * Now read and update the VM block.
+	 *
+	 * Note that the heap relation may have been dropped or truncated, leading
+	 * us to skip updating the heap block due to the LSN interlock. However,
+	 * even in that case, it's still safe to update the visibility map. Any
+	 * WAL record that clears the visibility map bit does so before checking
+	 * the page LSN, so any bits that need to be cleared will still be
+	 * cleared.
+	 *
+	 * Note that the lock on the heap page was dropped above. In normal
+	 * operation this would never be safe because a concurrent query could
+	 * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+	 * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+	 * the VM is set.
+	 *
+	 * In recovery, we expect no other writers, so writing to the VM page
+	 * without holding a lock on the heap page is considered safe enough. It
+	 * is done this way when replaying xl_heap_visible records (see
+	 * heap_xlog_visible()).
+	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		uint8		old_vmbits = 0;
+		Relation	reln = CreateFakeRelcacheEntry(rlocator);
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+		{
+			Assert(BufferIsDirty(vmbuffer));
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		}
-		else
-			UnlockReleaseBuffer(buffer);
+
+		FreeFakeRelcacheEntry(reln);
 	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
+	if (freespace > 0)
+		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9b25131543b..9e00fbf3cd1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -20,6 +20,7 @@
 #include "access/multixact.h"
 #include "access/transam.h"
 #include "access/xlog.h"
+#include "access/visibilitymapdefs.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
@@ -913,6 +914,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  InvalidBuffer, 0,
 									  conflict_xid,
 									  true,
 									  do_set_pd_vis,
@@ -2088,14 +2090,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phase III, the heap page may be marked
+ *   all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2103,6 +2109,10 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
  * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
  * the page LSN when checksums/wal_log_hints are enabled even if we did not
  * prune or freeze tuples on the page.
@@ -2113,6 +2123,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer, uint8 vmflags,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
 						  bool set_pd_all_vis,
@@ -2139,6 +2150,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlrec.flags = 0;
 	regbuf_flags = REGBUF_STANDARD;
 
+	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+
 	/*
 	 * We can avoid an FPI if the only modification we are making to the heap
 	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
@@ -2157,6 +2170,10 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 */
 	XLogBeginInsert();
 	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2213,6 +2230,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Prepare the main xl_heap_prune record.  We already set the XLHP_HAS_*
 	 * flag above.
 	 */
+	if (vmflags & VISIBILITYMAP_ALL_VISIBLE)
+	{
+		xlrec.flags |= XLHP_VM_ALL_VISIBLE;
+		if (vmflags & VISIBILITYMAP_ALL_FROZEN)
+			xlrec.flags |= XLHP_VM_ALL_FROZEN;
+	}
 	if (set_pd_all_vis)
 		xlrec.flags |= XLHP_SET_PD_ALL_VIS;
 	if (RelationIsAccessibleInLogicalDecoding(relation))
@@ -2247,6 +2270,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+	{
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+	}
+
 	/*
 	 * We must bump the page LSN if pruning or freezing. If we are only
 	 * updating PD_ALL_VISIBLE, though, we can skip doing this unless
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a0f3984e37f..b6c973cd111 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1906,6 +1906,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 					log_newpage_buffer(buf, true);
 				else
 					log_heap_prune_and_freeze(vacrel->rel, buf,
+											  InvalidBuffer,
+											  0,
 											  InvalidTransactionId, /* conflict xid */
 											  false,	/* cleanup lock */
 											  true, /* set_pd_all_vis */
@@ -2817,6 +2819,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
 	uint8		vmflags = 0;
@@ -2842,6 +2845,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
 			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 		}
+
+		/* Take the lock on the vmbuffer before entering a critical section */
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 	}
 
 	START_CRIT_SECTION();
@@ -2868,7 +2874,13 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * setting the VM, we must set PD_ALL_VISIBLE as well.
 	 */
 	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
 		PageSetAllVisible(page);
+		visibilitymap_set_vmbits(vacrel->rel,
+								 blkno,
+								 vmbuffer, vmflags);
+		conflict_xid = visibility_cutoff_xid;
+	}
 
 	/*
 	 * Mark buffer dirty before we write WAL.
@@ -2879,7 +2891,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  vmbuffer, vmflags,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
 								  (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
 								  PRUNE_VACUUM_CLEANUP,
@@ -2889,36 +2902,17 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * Note that we don't end the critical section until after emitting the VM
-	 * record. This ensures both PD_ALL_VISIBLE and the VM bits are set or
-	 * unset in the event of a crash. While it is correct for PD_ALL_VISIBLE
-	 * to be set and the VM to be clear, we should do our best to keep these
-	 * in sync. This does mean that we will take a lock on the VM buffer
-	 * inside of a critical section, which is generally discouraged. There is
-	 * precedent for this in other callers of visibilitymap_set(), though.
-	 */
+	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, set the
-	 * visibility map if the page became all-visible/all-frozen. Changes to
-	 * the heap page have already been logged.
-	 */
 	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		visibilitymap_set(vacrel->rel, blkno,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  vmflags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
 	}
 
-	END_CRIT_SECTION();
-
 	/* Revert to the previous phase information for error traceback */
 	restore_vacuum_error_info(vacrel, &saved_err_info);
 }
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..1cb44ca32d3 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
  * code, the latter of which is used in frontend (pg_waldump) code.
  */
 void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 									   int *nplans, xlhp_freeze_plan **plans,
 									   OffsetNumber **frz_offsets,
 									   int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & XLHP_VM_ALL_VISIBLE)
+		{
+			uint8		vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			if (xlrec->flags & XLHP_VM_ALL_FROZEN)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+		}
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2f77d8dbcd6..be66970c9f0 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -389,6 +389,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer, uint8 vmflags,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
 									  bool set_pd_all_vis,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 82b8f7f2bbc..833114e0a6e 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
  * Main data section:
  *
  *	xl_heap_prune
- *		uint8				flags
+ *		uint16				flags
  *	TransactionId			snapshot_conflict_horizon
  *
  * Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
  */
 typedef struct xl_heap_prune
 {
-	uint8		flags;
+	uint16		flags;
 
 	/*
 	 * If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,11 +292,17 @@ typedef struct xl_heap_prune
 	 */
 } xl_heap_prune;
 
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
 #define		XLHP_SET_PD_ALL_VIS			(1 << 0)
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -332,6 +338,15 @@ typedef struct xl_heap_prune
 #define		XLHP_HAS_DEAD_ITEMS	        (1 << 6)
 #define		XLHP_HAS_NOW_UNUSED_ITEMS   (1 << 7)
 
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set.
+ * xl_heap_prune should always use the XLHP_VM_ALL_VISIBLE and
+ * XLHP_VM_ALL_FROZEN flags and translate them to their visibilitymapdefs.h
+ * equivalents, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN.
+ */
+#define		XLHP_VM_ALL_VISIBLE			(1 << 8)
+#define		XLHP_VM_ALL_FROZEN			(1 << 9)
+
 /*
  * xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
  * (appears in xl_heap_prune's xlhp_freeze_plans sub-record)
@@ -498,7 +513,7 @@ extern XLogRecPtr log_heap_visible(Relation rel,
 								   uint8 vmflags);
 
 /* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
 												   OffsetNumber **frz_offsets,
 												   int *nredirected, OffsetNumber **redirected,
-- 
2.43.0



  [text/x-patch] v14-0013-Make-heap_page_is_all_visible-independent-of-LVR.patch (6.8K, 15-v14-0013-Make-heap_page_is_all_visible-independent-of-LVR.patch)
  download | inline diff:
From 2f820f93bfe273ed9b9867d3ddc9f4c67dd94296 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 15:39:31 -0400
Subject: [PATCH v14 13/24] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 57 ++++++++++++++++++----------
 1 file changed, 37 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 735f1e7501e..a0f3984e37f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,13 +463,18 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static bool heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid);
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2030,8 +2035,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2824,9 +2830,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
-	if (heap_page_would_be_all_visible(vacrel, buffer,
+	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+									   vacrel->cutoffs.OldestXmin,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid))
+									   &all_frozen, &visibility_cutoff_xid,
+									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
@@ -3576,15 +3584,19 @@ dead_items_cleanup(LVRelState *vacrel)
  * callers that expect no LP_DEAD on the page.
  */
 static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(vacrel, buf,
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid);
+										  visibility_cutoff_xid,
+										  logging_offnum);
 }
 
 /*
@@ -3599,7 +3611,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * vacrel->cutoffs.OldestXmin is used to determine visibility.
+ * OldestXmin is used to determine visibility.
  *
  * *all_frozen is an output parameter indicating to the caller if every tuple
  * on the page is frozen.
@@ -3607,6 +3619,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
  * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
  * visible tuples. It is only valid if the page is all-visible.
  *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
  * Callers looking to verify that the page is already all-visible can call
  * heap_page_is_all_visible().
  *
@@ -3616,11 +3631,13 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
  * to avoid introducing new side-effects here.
  */
 static bool
-heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid)
+							   TransactionId *visibility_cutoff_xid,
+							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3655,7 +3672,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3685,9 +3702,9 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
 										 buf))
 		{
 			case HEAPTUPLE_LIVE:
@@ -3708,7 +3725,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+											   OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3743,7 +3760,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
-- 
2.43.0



  [text/x-patch] v14-0015-Set-empty-pages-all-visible-in-XLOG_HEAP2_PRUNE_.patch (3.3K, 16-v14-0015-Set-empty-pages-all-visible-in-XLOG_HEAP2_PRUNE_.patch)
  download | inline diff:
From ed61f88812f33cb96cebeabc5c9c43a11cdd5a3e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 16:04:18 -0400
Subject: [PATCH v14 15/24] Set empty pages all-visible in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN record

As part of a project to eliminate XLOG_HEAP2_VISIBLE records, eliminate
their usage in phase I vacuum of empty pages.
---
 src/backend/access/heap/vacuumlazy.c | 55 +++++++++++++++++-----------
 1 file changed, 34 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b6c973cd111..e01fc5bb502 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1882,11 +1882,21 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			bool		set_pd_all_vis = true;
+
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
 			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
+			visibilitymap_set_vmbits(vacrel->rel, blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN);
+
 			if (RelationNeedsWAL(vacrel->rel))
 			{
 				/*
@@ -1897,34 +1907,37 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				 * all-visible and find that the page isn't initialized, which
 				 * will cause a PANIC. To prevent that, check whether the page
 				 * has been previously WAL-logged, and if not, do that now.
-				 *
-				 * Otherwise, just emit WAL for setting PD_ALL_VISIBLE on the
-				 * heap page. Doing this in a separate record from setting the
-				 * VM allows us to omit the heap page from the VM WAL chain.
 				 */
 				if (PageGetLSN(page) == InvalidXLogRecPtr)
+				{
 					log_newpage_buffer(buf, true);
-				else
-					log_heap_prune_and_freeze(vacrel->rel, buf,
-											  InvalidBuffer,
-											  0,
-											  InvalidTransactionId, /* conflict xid */
-											  false,	/* cleanup lock */
-											  true, /* set_pd_all_vis */
-											  PRUNE_VACUUM_SCAN,	/* reason */
-											  NULL, 0,
-											  NULL, 0,
-											  NULL, 0,
-											  NULL, 0);
+					set_pd_all_vis = false;
+				}
+
+				/*
+				 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+				 * setting the VM. If we emitted a new page record for the
+				 * page above, setting PD_ALL_VISIBLE will already have been
+				 * included in that record.
+				 */
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  set_pd_all_vis,
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
 			}
 
-			visibilitymap_set(vacrel->rel, blkno,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v14-0016-Set-VM-in-heap_page_prune_and_freeze.patch (22.3K, 17-v14-0016-Set-VM-in-heap_page_prune_and_freeze.patch)
  download | inline diff:
From 6d11a7bf77706bc4ddbdb156f25f9c53d4b1e615 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 15:46:40 -0400
Subject: [PATCH v14 16/24] Set VM in heap_page_prune_and_freeze

The determination as to whether or not the page can be set
all-visible/all-frozen has already been done by the end of
heap_page_prune_and_freeze(). Vacuum waited until it returns to
lazy_scan_prune() to actually set the VM, though.

This commit moves setting the VM into heap_page_prune_and_freeze().
There are still two separate WAL records -- one for the changes to the
heap page and one for the changes to the VM. But, this is an incremental
step toward logging setting the VM in the same WAL record as pruning and
freezing.

Note that this is not used by on-access pruning.
---
 src/backend/access/heap/pruneheap.c  | 221 +++++++++++++++++++++++----
 src/backend/access/heap/vacuumlazy.c | 146 ++----------------
 src/include/access/heapam.h          |  24 +--
 3 files changed, 221 insertions(+), 170 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9e00fbf3cd1..e3f9967e26c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,6 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/visibilitymapdefs.h"
 #include "access/xloginsert.h"
@@ -257,7 +258,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, PRUNE_ON_ACCESS, 0, NULL,
+			heap_page_prune_and_freeze(relation, buffer,
+									   InvalidBuffer, false,
+									   PRUNE_ON_ACCESS, 0, NULL,
 									   vistest, &presult, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -423,16 +426,115 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+					   BlockNumber heap_blk,
+					   Buffer heap_buf,
+					   Buffer vmbuffer,
+					   bool blk_known_av,
+					   PruneState *prstate,
+					   uint8 *vmflags,
+					   bool *do_set_pd_vis)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+	bool		do_set_vm = false;
+
+	if (prstate->all_visible && !PageIsAllVisible(heap_page))
+		*do_set_pd_vis = true;
+
+	if ((prstate->all_visible && !blk_known_av) ||
+		(prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+		do_set_vm = true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that in vacuum the bit
+	 * got cleared after heap_vac_scan_next_block() was called, so we must
+	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || PageIsAllVisible(heap_page) || *do_set_pd_vis);
+
+	return do_set_vm;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
+ * vmbuffer is the buffer that must already contain contain the required block
+ * of the visibility map if we are to update it. blk_known_av is the
+ * visibility status of the heap block as of the last call to
+ * find_next_unskippable_block().
+ *
  * reason indicates why the pruning is performed.  It is included in the WAL
  * record for debugging and analysis purposes, but otherwise has no effect.
  *
@@ -443,15 +545,20 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  *   FREEZE indicates that we will also freeze tuples, and will return
  *   'all_visible', 'all_frozen' flags to the caller.
  *
- * If the HEAP_PRUNE_FREEZE option is set, we will freeze tuples if it's
+ *   UPDATE_VIS indicates that we will set the page's status in the VM.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
  * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
+ *
  *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
@@ -478,6 +585,7 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   Buffer vmbuffer, bool blk_known_av,
 						   PruneReason reason,
 						   int options,
 						   const struct VacuumCutoffs *cutoffs,
@@ -496,10 +604,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	TransactionId frz_conflict_horizon = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
@@ -828,19 +939,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	Assert(!prstate.all_frozen || prstate.all_visible);
 
 	/*
-	 * Though callers should set the VM if PD_ALL_VISIBLE is set here, it is
-	 * allowed for the page-level bit to be set and the VM to be clear.
+	 * Determine whether or not to set the page level PD_ALL_VISIBLE and the
+	 * visibility map bits based on information from the VM and from
+	 * all_visible and all_frozen variables.
+	 *
+	 * Though callers should set the VM if PD_ALL_VISIBLE is set, it is
+	 * allowed for the page-level bit to be set and the VM to be clear. We log
+	 * setting PD_ALL_VISIBLE on the heap page in a
+	 * XLOG_HEAP2_PRUNE_VACUUM_SCAN record and setting the VM bits in a later
+	 * emitted XLOG_HEAP2_VISIBLE record.
+	 *
 	 * Setting PD_ALL_VISIBLE when we are making the changes to the page that
 	 * render it all-visible allows us to omit the heap page from the WAL
 	 * chain when later updating the VM -- even when checksums/wal_log_hints
 	 * are enabled.
 	 */
 	do_set_pd_vis = false;
+	do_set_vm = false;
 	if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
-	{
-		if (prstate.all_visible && !PageIsAllVisible(page))
-			do_set_pd_vis = true;
-	}
+		do_set_vm = heap_page_will_set_vis(relation,
+										   blockno, buffer, vmbuffer, blk_known_av,
+										   &prstate, &new_vmbits, &do_set_pd_vis);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -928,28 +1047,72 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * VACUUM will call heap_page_would_be_all_visible() during the second
+	 * pass over the heap to determine all_visible and all_frozen for the page
+	 * -- this is a specialized version of that logic. Now that we've finished
+	 * pruning and freezing, make sure that we're in total agreement with
+	 * heap_page_would_be_all_visible() using an assertion.
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
+
+		if (!heap_page_is_all_visible(relation, buffer,
+									  prstate.cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
+	}
+#endif
+
+	/* Now set the VM */
+	if (do_set_vm)
+	{
+		TransactionId vm_conflict_horizon;
+
+		Assert((new_vmbits & VISIBILITYMAP_VALID_BITS) != 0);
+
+		/*
+		 * The conflict horizon for that record must be the newest xmin on the
+		 * page.  However, if the page is completely frozen, there can be no
+		 * conflict and the vm_conflict_horizon should remain
+		 * InvalidTransactionId.  This includes the case that we just froze
+		 * all the tuples; the prune-freeze record included the conflict XID
+		 * already so a snapshotConflictHorizon sufficient to make everything
+		 * safe for REDO was logged when the page's tuples were frozen.
+		 */
+		if (prstate.all_frozen)
+			vm_conflict_horizon = InvalidTransactionId;
+		else
+			vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		old_vmbits = visibilitymap_set(relation, blockno,
+									   InvalidXLogRecPtr,
+									   vmbuffer, vm_conflict_horizon,
+									   new_vmbits);
+	}
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e01fc5bb502..8ec0476a0d4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   TransactionId OldestXmin,
 										   OffsetNumber *deadoffsets,
@@ -2014,7 +2009,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, PRUNE_VACUUM_SCAN, prune_options,
+	heap_page_prune_and_freeze(rel, buf,
+							   vmbuffer, all_visible_according_to_vm,
+							   PRUNE_VACUUM_SCAN, prune_options,
 							   &vacrel->cutoffs,
 							   vacrel->vistest,
 							   &presult,
@@ -2035,33 +2032,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2095,112 +2065,28 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
 	/*
-	 * Handle setting visibility map bits based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if ((presult.all_visible && !all_visible_according_to_vm) ||
-		(presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer)))
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		/*
-		 * If the page is all-frozen, we can pass InvalidTransactionId as our
-		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
-		 * everything safe for REDO was logged when the page's tuples were
-		 * frozen.
-		 */
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * Even if we are only setting the all-frozen bit, there is a small
-		 * chance that the VM was modified sometime between setting
-		 * all_visible_according_to_vm and checking the visibility during
-		 * pruning. Check the return value of old_vmbits to ensure the
-		 * visibility map counters used for logging are accurate.
-		 *
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-
-	/*
-	 * Now handle two potential corruption cases:
-	 *
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
-
 	return presult.ndeleted;
 }
 
@@ -3590,7 +3476,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Wrapper for heap_page_would_be_all_visible() which can be used for
  * callers that expect no LP_DEAD on the page.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId OldestXmin,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index be66970c9f0..797cd51145d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,14 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
+	uint8		new_vmbits;
+	uint8		old_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -375,6 +370,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   Buffer vmbuffer, bool blk_known_av,
 									   PruneReason reason,
 									   int options,
 									   const struct VacuumCutoffs *cutoffs,
@@ -403,6 +399,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v14-0017-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (12.6K, 18-v14-0017-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 9904f827846bb2660dbc9ff0ecb1d24dbe9dc3bc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 17:29:59 -0400
Subject: [PATCH v14 17/24] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the XLOG_HEAP2_PRUNE_VACUUM_SCAN record already emitted.

This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
 src/backend/access/heap/pruneheap.c | 183 +++++++++++++++++-----------
 src/include/access/heapam.h         |   3 +-
 2 files changed, 112 insertions(+), 74 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e3f9967e26c..a14c793da7e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -662,50 +662,58 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Keep track of whether or not the page will be all-visible and
+	 * all-frozen for use in opportunistic freezing and to update the VM if
+	 * the caller requests it.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Currently, only VACUUM attempts freezing. But other callers could. The
+	 * visibility bookkeeping is required for opportunistic freezing (in
+	 * addition to setting the VM bits) because we only consider
+	 * opportunistically freezing tuples if the whole page would become
+	 * all-frozen or if the whole page will be frozen except for dead tuples
+	 * that will be removed by vacuum. But if consider_update_vm is false,
+	 * we'll not set the VM even if the page is discovered to be all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible when we see LP_DEAD items.  We fix that after
-	 * scanning the line pointers, before we return the value to the caller,
-	 * so that the caller doesn't set the VM bit incorrectly.
+	 * If only HEAP_PAGE_PRUNE_UPDATE_ViS is passed and not
+	 * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+	 * because we will not call heap_prepare_freeze_tuple() on each tuple.
+	 *
+	 * Dead tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing, so we do not clear
+	 * all_visible when we see LP_DEAD items. We fix that after determining
+	 * whether or not to freeze but before deciding whether or not to update
+	 * the VM so that we don't set the VM bit incorrectly.
+	 *
+	 * If not freezing and not updating the VM, we avoid the extra
+	 * bookkeeping. Initializing all_visible to false allows skipping the work
+	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
 	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon.
+	 * This is most likely to happen when updating the VM and/or freezing all
+	 * live tuples on the page. It is updated before returning to the caller
+	 * because vacuum does assert-build only validation on the page using this
+	 * field.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -943,16 +951,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * visibility map bits based on information from the VM and from
 	 * all_visible and all_frozen variables.
 	 *
-	 * Though callers should set the VM if PD_ALL_VISIBLE is set, it is
-	 * allowed for the page-level bit to be set and the VM to be clear. We log
-	 * setting PD_ALL_VISIBLE on the heap page in a
-	 * XLOG_HEAP2_PRUNE_VACUUM_SCAN record and setting the VM bits in a later
-	 * emitted XLOG_HEAP2_VISIBLE record.
+	 * It is allowed for the page-level bit to be set and the VM to be clear,
+	 * however, we have a strong preference for keeping them in sync.
 	 *
-	 * Setting PD_ALL_VISIBLE when we are making the changes to the page that
-	 * render it all-visible allows us to omit the heap page from the WAL
-	 * chain when later updating the VM -- even when checksums/wal_log_hints
-	 * are enabled.
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 *
+	 * As such, it is possible to only update the VM when PD_ALL_VISIBLE is
+	 * already set.
 	 */
 	do_set_pd_vis = false;
 	do_set_vm = false;
@@ -961,6 +968,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 										   blockno, buffer, vmbuffer, blk_known_av,
 										   &prstate, &new_vmbits, &do_set_pd_vis);
 
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -991,7 +1002,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze || do_set_pd_vis)
+	if (do_prune || do_freeze || do_set_pd_vis || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1008,12 +1019,31 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_set_pd_vis)
 			PageSetAllVisible(page);
 
-		MarkBufferDirty(buffer);
+		if (do_prune || do_freeze || do_set_pd_vis)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			Assert(PageIsAllVisible(page));
+
+			old_vmbits = visibilitymap_set_vmbits(relation, blockno,
+												  vmbuffer, new_vmbits);
+			if (old_vmbits == new_vmbits)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				/* Unset so we don't emit WAL since no change occurred */
+				do_set_vm = false;
+			}
+		}
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
+		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did If we were
+		 * only updating the VM and it turns out it was already set, we will
+		 * have unset do_set_vm earlier. As such, check it again before
+		 * emitting the record.
 		 */
-		if (RelationNeedsWAL(relation))
+		if (RelationNeedsWAL(relation) &&
+			(do_prune || do_freeze || do_set_pd_vis || do_set_vm))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -1025,15 +1055,45 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId conflict_xid;
+			TransactionId conflict_xid = InvalidTransactionId;
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
+			/*
+			 * If we are updating the VM, the conflict horizon is almost
+			 * always the visibility cutoff XID.
+			 *
+			 * Separately, if we are freezing any tuples, as an optimization,
+			 * we can use the visibility_cutoff_xid as the conflict horizon if
+			 * the page will be all-frozen. This is true even if there are
+			 * LP_DEAD line pointers because we ignored those when maintaining
+			 * the visibility_cutoff_xid. This will have been calculated
+			 * earlier as the frz_conflict_horizon when we determined we would
+			 * freeze.
+			 */
+			if (do_set_vm)
+				conflict_xid = prstate.visibility_cutoff_xid;
+			else if (do_freeze)
 				conflict_xid = frz_conflict_horizon;
-			else
+
+			/*
+			 * If we are removing tuples with a younger xmax than our so far
+			 * calculated conflict_xid, we must use this as our horizon.
+			 */
+			if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
 				conflict_xid = prstate.latest_xid_removed;
 
+			/*
+			 * We can omit the snapshot conflict horizon if we are not pruning
+			 * or freezing any tuples and are setting an already all-visible
+			 * page all-frozen in the VM. In this case, all of the tuples on
+			 * the page must already be visible to all MVCC snapshots on the
+			 * standby.
+			 */
+			if (!do_prune && !do_freeze && do_set_vm &&
+				blk_known_av && (new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+				conflict_xid = InvalidTransactionId;
+
 			log_heap_prune_and_freeze(relation, buffer,
-									  InvalidBuffer, 0,
+									  vmbuffer, new_vmbits,
 									  conflict_xid,
 									  true,
 									  do_set_pd_vis,
@@ -1047,6 +1107,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
 	/*
@@ -1078,32 +1141,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 #endif
 
-	/* Now set the VM */
-	if (do_set_vm)
-	{
-		TransactionId vm_conflict_horizon;
-
-		Assert((new_vmbits & VISIBILITYMAP_VALID_BITS) != 0);
-
-		/*
-		 * The conflict horizon for that record must be the newest xmin on the
-		 * page.  However, if the page is completely frozen, there can be no
-		 * conflict and the vm_conflict_horizon should remain
-		 * InvalidTransactionId.  This includes the case that we just froze
-		 * all the tuples; the prune-freeze record included the conflict XID
-		 * already so a snapshotConflictHorizon sufficient to make everything
-		 * safe for REDO was logged when the page's tuples were frozen.
-		 */
-		if (prstate.all_frozen)
-			vm_conflict_horizon = InvalidTransactionId;
-		else
-			vm_conflict_horizon = prstate.visibility_cutoff_xid;
-		old_vmbits = visibilitymap_set(relation, blockno,
-									   InvalidXLogRecPtr,
-									   vmbuffer, vm_conflict_horizon,
-									   new_vmbits);
-	}
-
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
@@ -2261,7 +2298,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phase III, the heap page may be marked
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
  *   all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 797cd51145d..cac7a4c2899 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -239,7 +239,8 @@ typedef struct PruneFreezeResult
 	 * visibility map before updating it during phase I of vacuuming.
 	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+	 * we have actually updated the VM.
 	 */
 	uint8		new_vmbits;
 	uint8		old_vmbits;
-- 
2.43.0



  [text/x-patch] v14-0018-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (21.1K, 19-v14-0018-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 6f94908b0649956e1d1abbbd5c362a57282c2c26 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 17:42:54 -0400
Subject: [PATCH v14 18/24] Remove XLOG_HEAP2_VISIBLE entirely

There are now no users of this, so eliminate it entirely.
This includes the xl_heap_visible struct as well as all of the functions
used to emit and replay XLOG_HEAP2_VISIBLE records.

ci-os-only:
---
 src/backend/access/common/bufmask.c      |  4 +-
 src/backend/access/heap/heapam.c         | 40 ++--------
 src/backend/access/heap/heapam_xlog.c    | 96 +++---------------------
 src/backend/access/heap/pruneheap.c      |  4 +-
 src/backend/access/heap/vacuumlazy.c     | 14 ++--
 src/backend/access/heap/visibilitymap.c  | 83 +-------------------
 src/backend/access/rmgrdesc/heapdesc.c   | 10 ---
 src/backend/replication/logical/decode.c |  1 -
 src/backend/storage/ipc/standby.c        | 12 +--
 src/include/access/heapam_xlog.h         | 19 -----
 src/include/access/visibilitymap.h       | 11 +--
 src/include/access/visibilitymapdefs.h   |  9 ---
 src/tools/pgindent/typedefs.list         |  1 -
 13 files changed, 36 insertions(+), 268 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0323e2df409..ab514ce65ec 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(relation,
-									 BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set(relation,
+							  BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN);
 		}
 
 		/*
@@ -8799,36 +8799,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-	XLogRegisterBuffer(0, vm_buffer, 0);
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index c1f332f7a9a..a8908373067 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -251,8 +251,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	 *
 	 * In recovery, we expect no other writers, so writing to the VM page
 	 * without holding a lock on the heap page is considered safe enough. It
-	 * is done this way when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * is also done this way when replaying COPY FREEZE records (see
+	 * heap_xlog_multi_insert()).
 	 */
 	if (vmflags & VISIBILITYMAP_VALID_BITS &&
 		XLogReadBufferForRedoExtended(record, 1,
@@ -268,7 +268,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+		old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
 
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
@@ -287,81 +287,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * It is imperative that the previously emitted record set PD_ALL_VISIBLE on
- * the heap page. We must never end up with a situation where the visibility
- * map bit is set, and the page-level PD_ALL_VISIBLE bit is clear.  If that
- * were to occur, then a subsequent page modification would fail to clear the
- * visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Even if the heap relation was dropped or truncated and the previously
-	 * emitted record skipped the heap page update due to this LSN interlock,
-	 * it's still safe to update the visibility map.  Any WAL record that
-	 * clears the visibility map bit does so before checking the page LSN, so
-	 * any bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -739,8 +664,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * In recovery, we expect no other writers, so writing to the VM page
 	 * without holding a lock on the heap page is considered safe enough. It
-	 * is done this way when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * is done this way when replaying xl_heap_prune records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -753,10 +678,10 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(reln, blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN);
+		visibilitymap_set(reln, blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
 
 		/*
 		 * It is not possible that the VM was already set for this heap page,
@@ -1342,9 +1267,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a14c793da7e..39d59a43ff7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1026,8 +1026,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		{
 			Assert(PageIsAllVisible(page));
 
-			old_vmbits = visibilitymap_set_vmbits(relation, blockno,
-												  vmbuffer, new_vmbits);
+			old_vmbits = visibilitymap_set(relation, blockno,
+										   vmbuffer, new_vmbits);
 			if (old_vmbits == new_vmbits)
 			{
 				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ec0476a0d4..28436389d63 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1887,10 +1887,10 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			visibilitymap_set_vmbits(vacrel->rel, blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set(vacrel->rel, blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN);
 
 			if (RelationNeedsWAL(vacrel->rel))
 			{
@@ -2775,9 +2775,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
 	{
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(vacrel->rel,
-								 blkno,
-								 vmbuffer, vmflags);
+		visibilitymap_set(vacrel->rel,
+						  blkno,
+						  vmbuffer, vmflags);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 75fcb3f067a..38d3131e56b 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,82 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, vmBuf, cutoff_xid, flags);
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
 
 /*
  * Set flags in the VM block contained in the passed in vmBuf.
@@ -318,8 +241,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk,
  * is pinned and exclusive locked.
  */
 uint8
-visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 1cb44ca32d3..93505cb8c56 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -460,9 +453,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 833114e0a6e..61ceaf2a98b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -451,19 +450,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -507,11 +493,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 302adf4856a..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e90af5b2ad3..32c0f4719c3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4268,7 +4268,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v14-0019-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (7.1K, 20-v14-0019-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From cbfb5ee8a412651c604307cd0bd611f187ed348a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v14 19/24] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()

Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 14 +++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 13 ++++++-------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 39d59a43ff7..471151fae2e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -218,7 +218,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -727,9 +727,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1199,11 +1199,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..f67f01c17c2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v14-0020-Use-GlobalVisState-to-determine-page-level-visib.patch (10.7K, 21-v14-0020-Use-GlobalVisState-to-determine-page-level-visib.patch)
  download | inline diff:
From aeb0c7ed54566dfd8b67d4ad50d46938b1ccf95d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v14 20/24] Use GlobalVisState to determine page level
 visibility

During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.

It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.

Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.

Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
 src/backend/access/heap/heapam_visibility.c | 28 +++++++++++++
 src/backend/access/heap/pruneheap.c         | 46 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 20 ++++-----
 src/include/access/heapam.h                 |  4 +-
 4 files changed, 60 insertions(+), 38 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 471151fae2e..bb7a1357a89 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -134,10 +134,9 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon, when setting the VM or when
+	 * freezing all the live tuples on the page.
 	 *
 	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
 	 * convenient for heap_page_prune_and_freeze(), to use them to decide
@@ -706,14 +705,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon.
-	 * This is most likely to happen when updating the VM and/or freezing all
-	 * live tuples on the page. It is updated before returning to the caller
-	 * because vacuum does assert-build only validation on the page using this
-	 * field.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState.
+	 *
+	 * If we encounter an uncommitted tuple, this field is unmaintained. If
+	 * the page is being set all-visible or when freezing all live tuples on
+	 * the page, it is used to calculate the snapshot conflict horizon.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -909,6 +906,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update the hint
@@ -1129,7 +1136,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		Assert(prstate.cutoffs);
 
 		if (!heap_page_is_all_visible(relation, buffer,
-									  prstate.cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1655,19 +1662,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 28436389d63..341115dbbbe 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2733,7 +2733,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 InvalidOffsetNumber);
 
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3478,14 +3478,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3504,7 +3503,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * *all_frozen is an output parameter indicating to the caller if every tuple
  * on the page is frozen.
@@ -3525,7 +3524,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3597,8 +3596,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		tuple.t_len = ItemIdGetLength(itemid);
 		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
-										 buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+												  buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3617,8 +3616,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin,
-											   OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index cac7a4c2899..35a25cf0b04 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -401,7 +401,7 @@ extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -413,6 +413,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v14-0021-Inline-TransactionIdFollows-Precedes.patch (5.0K, 22-v14-0021-Inline-TransactionIdFollows-Precedes.patch)
  download | inline diff:
From 7ea26725c69aba6f269692387a6e923614181cc4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v14 21/24] Inline TransactionIdFollows/Precedes()

Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/transam/transam.c | 64 -------------------------
 src/include/access/transam.h         | 70 ++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
-	/*
-	 * If either ID is a permanent XID then we can just do unsigned
-	 * comparison.  If both are normal, do a modulo-2^32 comparison.
-	 */
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 < id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 <= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 > id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 >= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff >= 0);
-}
-
 
 /*
  * TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
 } TransamVariablesData;
 
 
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+	/*
+	 * If either ID is a permanent XID then we can just do unsigned
+	 * comparison.  If both are normal, do a modulo-2^32 comparison.
+	 */
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 < id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 <= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 > id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 >= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff >= 0);
+}
+
+
 /* ----------------
  *		extern declarations
  * ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
 extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
-- 
2.43.0



  [text/x-patch] v14-0022-Unset-all-visible-sooner-if-not-freezing.patch (2.5K, 23-v14-0022-Unset-all-visible-sooner-if-not-freezing.patch)
  download | inline diff:
From eea3df3f0660f868df56fa0043c182b2fb3c0258 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v14 22/24] Unset all-visible sooner if not freezing

In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.

Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.

Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
 src/backend/access/heap/pruneheap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index bb7a1357a89..c29f47ab151 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1522,8 +1522,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1776,8 +1779,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v14-0024-Set-pd_prune_xid-on-insert.patch (6.5K, 24-v14-0024-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 0134ca707f4c64620ff26c69d703b79ec421ac91 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v14 24/24] Set pd_prune_xid on insert

Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.

For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.

Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 94d673d92c0..47aa9638724 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a8908373067..a2c4e4f47fe 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -486,6 +486,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -635,9 +641,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



  [text/x-patch] v14-0023-Allow-on-access-pruning-to-set-pages-all-visible.patch (27.9K, 25-v14-0023-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From feedf2af7c6e0f025d4c0b35d7f7cb9df71e18a9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v14 23/24] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c              | 15 +++-
 src/backend/access/heap/heapam_handler.c      | 15 +++-
 src/backend/access/heap/pruneheap.c           | 73 +++++++++++++++----
 src/backend/access/index/indexam.c            | 46 ++++++++++++
 src/backend/access/table/tableam.c            | 39 +++++++++-
 src/backend/executor/execMain.c               |  4 +
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  7 +-
 src/backend/executor/nodeIndexscan.c          | 18 +++--
 src/backend/executor/nodeSeqscan.c            | 24 ++++--
 src/include/access/genam.h                    | 11 +++
 src/include/access/heapam.h                   | 24 +++++-
 src/include/access/relscan.h                  |  6 ++
 src/include/access/tableam.h                  | 30 +++++++-
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  3 +-
 16 files changed, 285 insertions(+), 38 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ab514ce65ec..94d673d92c0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c29f47ab151..3eaee398735 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -45,6 +45,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	const struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -185,9 +187,13 @@ static void page_verify_redirects(Page page);
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -251,6 +257,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			int			options = 0;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+			}
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -258,8 +271,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * that during on-access pruning with the current implementation.
 			 */
 			heap_page_prune_and_freeze(relation, buffer,
-									   InvalidBuffer, false,
-									   PRUNE_ON_ACCESS, 0, NULL,
+									   vmbuffer ? *vmbuffer : InvalidBuffer,
+									   false,	/* blk_known_av */
+									   PRUNE_ON_ACCESS, options, NULL,
 									   vistest, &presult, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -443,6 +457,8 @@ heap_page_will_set_vis(Relation relation,
 					   Buffer heap_buf,
 					   Buffer vmbuffer,
 					   bool blk_known_av,
+					   PruneReason reason,
+					   bool do_prune, bool do_freeze,
 					   PruneState *prstate,
 					   uint8 *vmflags,
 					   bool *do_set_pd_vis)
@@ -450,6 +466,32 @@ heap_page_will_set_vis(Relation relation,
 	Page		heap_page = BufferGetPage(heap_buf);
 	bool		do_set_vm = false;
 
+	*do_set_pd_vis = false;
+
+	if (!prstate->attempt_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+	{
+		prstate->all_visible = prstate->all_frozen = false;
+		return false;
+	}
+
 	if (prstate->all_visible && !PageIsAllVisible(heap_page))
 		*do_set_pd_vis = true;
 
@@ -473,6 +515,9 @@ heap_page_will_set_vis(Relation relation,
 	 * page-level bit is clear.  However, it's possible that in vacuum the bit
 	 * got cleared after heap_vac_scan_next_block() was called, so we must
 	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 *
+	 * XXX: This will never trigger for on-access pruning because it passes
+	 * blk_known_av as false. Should we remove that condition here?
 	 */
 	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
 			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -615,6 +660,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
 	prstate.cutoffs = cutoffs;
 
 	/*
@@ -692,7 +738,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
-	else if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
+	else if (prstate.attempt_update_vm)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = false;
@@ -951,7 +997,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	if (prstate.lpdead_items > 0)
 		prstate.all_visible = prstate.all_frozen = false;
 
-	Assert(!prstate.all_frozen || prstate.all_visible);
+
 
 	/*
 	 * Determine whether or not to set the page level PD_ALL_VISIBLE and the
@@ -968,12 +1014,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * As such, it is possible to only update the VM when PD_ALL_VISIBLE is
 	 * already set.
 	 */
-	do_set_pd_vis = false;
-	do_set_vm = false;
-	if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
-		do_set_vm = heap_page_will_set_vis(relation,
-										   blockno, buffer, vmbuffer, blk_known_av,
-										   &prstate, &new_vmbits, &do_set_pd_vis);
+	do_set_vm = heap_page_will_set_vis(relation,
+									   blockno, buffer, vmbuffer, blk_known_av,
+									   reason, do_prune, do_freeze,
+									   &prstate, &new_vmbits, &do_set_pd_vis);
+
+	Assert(!prstate.all_frozen || prstate.all_visible);
 
 	/* Lock vmbuffer before entering a critical section */
 	if (do_set_vm)
@@ -1133,7 +1179,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		if (!heap_page_is_all_visible(relation, buffer,
 									  prstate.vistest,
@@ -2298,8 +2343,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
  * all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff12e2e1364..2e0474c948a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 35a25cf0b04..4da629067d1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -369,7 +386,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 									   Buffer vmbuffer, bool blk_known_av,
 									   PruneReason reason,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index b2ce35e2a34..e31c21cf8eb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -881,6 +883,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -918,10 +939,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, struct ScanKeyData *key)
+				   int nkeys, struct ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3a920cc7d17..c854be93436 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-18 16:48  Andres Freund <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Andres Freund @ 2025-09-18 16:48 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Robert Haas <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hi,

On 2025-09-17 20:10:07 -0400, Melanie Plageman wrote:
> 0001 is RFC but waiting on one other reviewer

> From cacff6c95e38d370b87148bc48cf6ac5f086ed07 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Tue, 17 Jun 2025 17:22:10 -0400
> Subject: [PATCH v14 01/24] Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE
> diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
> index cf843277938..faa7c561a8a 100644
> --- a/src/backend/access/heap/heapam_xlog.c
> +++ b/src/backend/access/heap/heapam_xlog.c
> @@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
>  	int			i;
>  	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
>  	XLogRedoAction action;
> +	Buffer		vmbuffer = InvalidBuffer;
>
>  	/*
>  	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
> @@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
>  	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
>  	{
>  		Relation	reln = CreateFakeRelcacheEntry(rlocator);
> -		Buffer		vmbuffer = InvalidBuffer;
>
>  		visibilitymap_pin(reln, blkno, &vmbuffer);
>  		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
>  		ReleaseBuffer(vmbuffer);
> +		vmbuffer = InvalidBuffer;
>  		FreeFakeRelcacheEntry(reln);
>  	}
>
> @@ -662,6 +663,57 @@ heap_xlog_multi_insert(XLogReaderState *record)
>  	if (BufferIsValid(buffer))
>  		UnlockReleaseBuffer(buffer);
>
> +	buffer = InvalidBuffer;
> +
> +	/*
> +	 * Now read and update the VM block.
> +	 *
> +	 * Note that the heap relation may have been dropped or truncated, leading
> +	 * us to skip updating the heap block due to the LSN interlock.

I don't fully understand this - how does dropping/truncating the relation lead
to skipping due to the LSN interlock?


> +	 * even in that case, it's still safe to update the visibility map. Any
> +	 * WAL record that clears the visibility map bit does so before checking
> +	 * the page LSN, so any bits that need to be cleared will still be
> +	 * cleared.
> +	 *
> +	 * Note that the lock on the heap page was dropped above. In normal
> +	 * operation this would never be safe because a concurrent query could
> +	 * modify the heap page and clear PD_ALL_VISIBLE -- violating the
> +	 * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
> +	 * the VM is set.
> +	 *
> +	 * In recovery, we expect no other writers, so writing to the VM page
> +	 * without holding a lock on the heap page is considered safe enough. It
> +	 * is done this way when replaying xl_heap_visible records (see
> +	 * heap_xlog_visible()).
> +	 */
> +	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
> +		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
> +									  &vmbuffer) == BLK_NEEDS_REDO)
> +	{

Why are we using RBM_ZERO_ON_ERROR here? I know it's copied from
heap_xlog_visible(), but I don't immediately understand (or remember) why we
do so there either?


> +		Page		vmpage = BufferGetPage(vmbuffer);
> +		Relation	reln = CreateFakeRelcacheEntry(rlocator);

Hm. Do we really need to continue doing this ugly fake relcache stuff? I'd
really like to eventually get rid of that and given that the new "code shape"
delegates a lot more responsibility to the redo routines, they should have a
fairly easy time not needing a fake relcache?  Afaict the relation already is
not used outside of debugging paths?


> +		/* initialize the page if it was read as zeros */
> +		if (PageIsNew(vmpage))
> +			PageInit(vmpage, BLCKSZ, 0);
> +
> +		visibilitymap_set_vmbits(reln, blkno,
> +								 vmbuffer,
> +								 VISIBILITYMAP_ALL_VISIBLE |
> +								 VISIBILITYMAP_ALL_FROZEN);
> +
> +		/*
> +		 * It is not possible that the VM was already set for this heap page,
> +		 * so the vmbuffer must have been modified and marked dirty.
> +		 */

I assume that's because we a) checked the LSN interlock b) are replaying
something that needed to newly set the bit?


Except for the above comments, this looks pretty good to me.


Seems 0002 should just be applied...


Re 0003: I wonder if it's getting to the point that a struct should be used as
the argument.

Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-24 17:07  Melanie Plageman <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-09-24 17:07 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Robert Haas <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Thu, Sep 18, 2025 at 12:48 PM Andres Freund <[email protected]> wrote:
>
> On 2025-09-17 20:10:07 -0400, Melanie Plageman wrote:
>
> > +     /*
> > +      * Now read and update the VM block.
> > +      *
> > +      * Note that the heap relation may have been dropped or truncated, leading
> > +      * us to skip updating the heap block due to the LSN interlock.
>
> I don't fully understand this - how does dropping/truncating the relation lead
> to skipping due to the LSN interlock?

Yes, this wasn't right. I misunderstood.

What I think it should say is that if the heap update was skipped due
to LSN interlock we still have to replay the updates to the VM because
each vm page contains bits for multiple heap blocks and if the record
included a vm page FPI, subsequent updates to the VM may rely on this
FPI to avoid torn pages. We don't condition it on the heap redo having
been an FPI, probably because it is not worth it -- but I wonder if
that is worth calling out in the comment?

Do we also need to replay it when the heap redo returns BLK_NOTFOUND?
I assume this can happen in the case of relation dropped or truncated
-- but in this case there wouldn't be subsequent records updating the
VM for other heap blocks that we need to replay because the other heap
blocks won't be found either, right?

> > +     if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
> > +             XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
> > +                                                                       &vmbuffer) == BLK_NEEDS_REDO)
> > +     {
>
> Why are we using RBM_ZERO_ON_ERROR here? I know it's copied from
> heap_xlog_visible(), but I don't immediately understand (or remember) why we
> do so there either?

It has been RBM_ZERO_ON_ERROR since XLogReadBufferForRedoExtended()
was introduced here in 2c03216d8311.
I think we probably do this because vm_readbuf() passes ReadBuffer()
RBM_ZERO_ON_ERROR and has this comment

     * For reading we use ZERO_ON_ERROR mode, and initialize the page if
     * necessary. It's always safe to clear bits, so it's better to clear
     * corrupt pages than error out.

Do you think I also should have a comment in heap_xlog_multi_insert()?

> > +             Page            vmpage = BufferGetPage(vmbuffer);
> > +             Relation        reln = CreateFakeRelcacheEntry(rlocator);
>
> Hm. Do we really need to continue doing this ugly fake relcache stuff? I'd
> really like to eventually get rid of that and given that the new "code shape"
> delegates a lot more responsibility to the redo routines, they should have a
> fairly easy time not needing a fake relcache?  Afaict the relation already is
> not used outside of debugging paths?

Yes, interestingly we don't have the relname in recovery anyway, so it
does all this fake relcache stuff only to convert the relfilenode to a
string and uses that.

The fake relcache stuff will still be used by visibilitymap_pin()
which callers like heap_xlog_delete() use to get the VM page. And I
don't think it is worth coming up with a version of that that doesn't
use the relcache. But you're right that the Relation is not needed for
visibilitymap_set_vmbits(). I've changed it to just take the relation
name as a string.


> > +             /* initialize the page if it was read as zeros */
> > +             if (PageIsNew(vmpage))
> > +                     PageInit(vmpage, BLCKSZ, 0);
> > +
> > +             visibilitymap_set_vmbits(reln, blkno,
> > +                                                              vmbuffer,
> > +                                                              VISIBILITYMAP_ALL_VISIBLE |
> > +                                                              VISIBILITYMAP_ALL_FROZEN);
> > +
> > +             /*
> > +              * It is not possible that the VM was already set for this heap page,
> > +              * so the vmbuffer must have been modified and marked dirty.
> > +              */
>
> I assume that's because we a) checked the LSN interlock b) are replaying
> something that needed to newly set the bit?

Yes, perhaps it is not worth having the assert since it attracts extra
attention to an invariant that is unlikely to be in danger of
regression.

> Seems 0002 should just be applied...

Done

> Re 0003: I wonder if it's getting to the point that a struct should be used as
> the argument.

I have been thinking about this. I have yet to come up with a good
idea for a struct name or multiple struct names that seem to fit here.
I could move the other output parameters into the PruneFreezeResult
and then maybe make some kind of PruneFreezeParameters struct or
something?

- Melanie


Attachments:

  [text/x-patch] v15-0002-Reorder-heap_page_prune_and_freeze-parameters.patch (6.2K, 2-v15-0002-Reorder-heap_page_prune_and_freeze-parameters.patch)
  download | inline diff:
From bca3a9c979507bc631193bd9ca5d39556bed383d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 12:06:19 -0400
Subject: [PATCH v15 02/23] Reorder heap_page_prune_and_freeze parameters

Move read-only parameters to the beginning of the function, making it
more clear which parameters are inputs and which are input/outputs or
outputs. Also const-qualify VacuumCutoffs, which is not modified in
heap_page_prune_and_freeze().
---
 src/backend/access/heap/pruneheap.c  | 40 ++++++++++++++--------------
 src/backend/access/heap/vacuumlazy.c |  6 +++--
 src/include/access/heapam.h          |  6 ++---
 3 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d8ea0c78f77..28bd6a56749 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		freeze;
-	struct VacuumCutoffs *cutoffs;
+	const struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
 	 * Fields describing what to do to the page
@@ -260,8 +260,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+			heap_page_prune_and_freeze(relation, buffer, PRUNE_ON_ACCESS, 0, NULL,
+									   vistest, &presult, &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -303,7 +303,17 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
+ * reason indicates why the pruning is performed.  It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
+ *
+ * options:
+ *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ *   pruning.
+ *
+ *   FREEZE indicates that we will also freeze tuples, and will return
+ *   'all_visible', 'all_frozen' flags to the caller.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set, we will freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
  * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
@@ -313,29 +323,19 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
  * that also freeze need that information.
  *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
  * cutoffs->OldestXmin is also used to determine if dead tuples are
  * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
  *
+ * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+ * (see heap_prune_satisfies_vacuum).
+ *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -348,11 +348,11 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-						   GlobalVisState *vistest,
+						   PruneReason reason,
 						   int options,
-						   struct VacuumCutoffs *cutoffs,
+						   const struct VacuumCutoffs *cutoffs,
+						   GlobalVisState *vistest,
 						   PruneFreezeResult *presult,
-						   PruneReason reason,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 981d9380a92..ddc9677694c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1974,8 +1974,10 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+	heap_page_prune_and_freeze(rel, buf, PRUNE_VACUUM_SCAN, prune_options,
+							   &vacrel->cutoffs,
+							   vacrel->vistest,
+							   &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a1de400b9a5..665e0c79baf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -368,11 +368,11 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-									   struct GlobalVisState *vistest,
+									   PruneReason reason,
 									   int options,
-									   struct VacuumCutoffs *cutoffs,
+									   const struct VacuumCutoffs *cutoffs,
+									   struct GlobalVisState *vistest,
 									   PruneFreezeResult *presult,
-									   PruneReason reason,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
-- 
2.43.0



  [text/x-patch] v15-0004-Rename-PruneState.freeze-to-attempt_freeze.patch (4.9K, 3-v15-0004-Rename-PruneState.freeze-to-attempt_freeze.patch)
  download | inline diff:
From bbf405f68ab042b1a01241a9700ad3506ebea789 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v15 04/23] Rename PruneState.freeze to attempt_freeze

This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.

Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.

And rename local variable do_hint to do_hint_prune. This distinguishes
the prunable and page full hints used to decide whether or not to
on-access prune a page from other page-level and tuple hint bits.
---
 src/backend/access/heap/pruneheap.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ea8216e0632..740aa07cd83 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -42,7 +42,7 @@ typedef struct
 	/* whether or not dead items can be set LP_UNUSED during pruning */
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
-	bool		freeze;
+	bool		attempt_freeze;
 	const struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -361,14 +361,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	HeapTupleData tup;
 	bool		do_freeze;
 	bool		do_prune;
-	bool		do_hint;
-	bool		hint_bit_fpi;
+	bool		do_hint_prune;
+	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
 	prstate.cutoffs = cutoffs;
 
 	/*
@@ -390,7 +390,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/* initialize page freezing working state */
 	prstate.pagefrz.freeze_required = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
 		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -437,7 +437,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * function, when we return the value to the caller, so that the caller
 	 * doesn't set the VM bit incorrectly.
 	 */
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -551,7 +551,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
 	 * an FPI to be emitted.
 	 */
-	hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	/*
 	 * Process HOT chains.
@@ -659,7 +659,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * pd_prune_xid field or the page was marked full, we will update the hint
 	 * bit.
 	 */
-	do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
 	/*
@@ -667,7 +667,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * plans we prepared, or not.
 	 */
 	do_freeze = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (prstate.pagefrz.freeze_required)
 		{
@@ -702,14 +702,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				 */
 				if (RelationNeedsWAL(relation))
 				{
-					if (hint_bit_fpi)
+					if (did_tuple_hint_fpi)
 						do_freeze = true;
 					else if (do_prune)
 					{
 						if (XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
 					}
-					else if (do_hint)
+					else if (do_hint_prune)
 					{
 						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
@@ -752,7 +752,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
-	if (do_hint)
+	if (do_hint_prune)
 	{
 		/*
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
@@ -893,7 +893,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
 		{
@@ -1475,7 +1475,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 	}
 
 	/* Consider freezing any normal tuples which will not be removed */
-	if (prstate->freeze)
+	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
 
-- 
2.43.0



  [text/x-patch] v15-0005-Add-helper-for-freeze-determination-to-heap_page.patch (7.0K, 4-v15-0005-Add-helper-for-freeze-determination-to-heap_page.patch)
  download | inline diff:
From 8d5d18247faca56c12938333161aa2d19e70341e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 14:22:10 -0400
Subject: [PATCH v15 05/23] Add helper for freeze determination to
 heap_page_prune_and_freeze

After scanning through the line pointers on the heap page during
vacuum's first phase, we use several statuses and information we
collected to determine whether or not we will use the freeze plans we
assembled.

Do this in a helper for better readability.
---
 src/backend/access/heap/pruneheap.c | 199 +++++++++++++++++-----------
 1 file changed, 119 insertions(+), 80 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 740aa07cd83..4ed74de6f27 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -289,6 +289,120 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans
+ * we prepared for the given heap buffer or not. If the caller specified we
+ * should not freeze tuples, it exits early. Otherwise, it does a few
+ * pre-freeze checks.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool did_tuple_hint_fpi,
+					  bool do_prune,
+					  bool do_hint_prune,
+					  PruneState *prstate)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and exit.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			Assert(prstate->all_visible);
+
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_prune)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	return do_freeze;
+}
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -666,87 +780,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
 	 */
-	do_freeze = false;
-	if (prstate.attempt_freeze)
-	{
-		if (prstate.pagefrz.freeze_required)
-		{
-			/*
-			 * heap_prepare_freeze_tuple indicated that at least one XID/MXID
-			 * from before FreezeLimit/MultiXactCutoff is present.  Must
-			 * freeze to advance relfrozenxid/relminmxid.
-			 */
-			do_freeze = true;
-		}
-		else
-		{
-			/*
-			 * Opportunistically freeze the page if we are generating an FPI
-			 * anyway and if doing so means that we can set the page
-			 * all-frozen afterwards (might not happen until VACUUM's final
-			 * heap pass).
-			 *
-			 * XXX: Previously, we knew if pruning emitted an FPI by checking
-			 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze
-			 * and prune records were combined, this heuristic couldn't be
-			 * used anymore.  The opportunistic freeze heuristic must be
-			 * improved; however, for now, try to approximate the old logic.
-			 */
-			if (prstate.all_frozen && prstate.nfrozen > 0)
-			{
-				Assert(prstate.all_visible);
+	do_freeze = heap_page_will_freeze(relation, buffer,
+									  did_tuple_hint_fpi,
+									  do_prune,
+									  do_hint_prune,
+									  &prstate);
 
-				/*
-				 * Freezing would make the page all-frozen.  Have already
-				 * emitted an FPI or will do so anyway?
-				 */
-				if (RelationNeedsWAL(relation))
-				{
-					if (did_tuple_hint_fpi)
-						do_freeze = true;
-					else if (do_prune)
-					{
-						if (XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-					else if (do_hint_prune)
-					{
-						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
-	}
-	else if (prstate.nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate.pagefrz.freeze_required);
-
-		prstate.all_frozen = false;
-		prstate.nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
-- 
2.43.0



  [text/x-patch] v15-0003-Keep-all_frozen-updated-in-heap_page_prune_and_f.patch (5.3K, 5-v15-0003-Keep-all_frozen-updated-in-heap_page_prune_and_f.patch)
  download | inline diff:
From 026fe909ccd34e8f7ca92a56c83e9c2aac813a10 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v15 03/23] Keep all_frozen updated in
 heap_page_prune_and_freeze

We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.

Future commits will separate usage of these fields, so it is best not to
rely on all_visible for all_frozen's validity.
---
 src/backend/access/heap/pruneheap.c  | 21 ++++++++++-----------
 src/backend/access/heap/vacuumlazy.c |  9 ++++-----
 2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 28bd6a56749..ea8216e0632 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -142,10 +142,6 @@ typedef struct
 	 * whether to freeze the page or not.  The all_visible and all_frozen
 	 * values returned to the caller are adjusted to include LP_DEAD items at
 	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -696,8 +692,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * used anymore.  The opportunistic freeze heuristic must be
 			 * improved; however, for now, try to approximate the old logic.
 			 */
-			if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
+			if (prstate.all_frozen && prstate.nfrozen > 0)
 			{
+				Assert(prstate.all_visible);
+
 				/*
 				 * Freezing would make the page all-frozen.  Have already
 				 * emitted an FPI or will do so anyway?
@@ -750,6 +748,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 	}
 
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -819,7 +818,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 */
 			if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
+				if (prstate.all_frozen)
 					frz_conflict_horizon = prstate.visibility_cutoff_xid;
 				else
 				{
@@ -1382,7 +1381,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1404,7 +1403,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1417,7 +1416,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1436,7 +1435,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1454,7 +1453,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ddc9677694c..50cc898087f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2003,7 +2003,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * agreement with heap_page_is_all_visible() using an assertion.
 	 */
 #ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
 	if (presult.all_visible)
 	{
 		TransactionId debug_cutoff;
@@ -2056,6 +2055,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
 	Assert(!presult.all_visible || !(*has_lpdead_items));
+	Assert(!presult.all_frozen || presult.all_visible);
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
@@ -2161,11 +2161,10 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
+	 * it as all-frozen.
 	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	else if (all_visible_according_to_vm && presult.all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		uint8		old_vmbits;
 
-- 
2.43.0



  [text/x-patch] v15-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patch (12.4K, 6-v15-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patch)
  download | inline diff:
From 5ae81ae4429b21bab607a053c80e5b9217b48751 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v15 01/23] Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE

Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting
bits in the VM, specify the changes to make to the VM block in the
XLOG_HEAP2_MULTI_INSERT record.

This halves the number of WAL records emitted by COPY FREEZE.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c        | 44 ++++++++++------
 src/backend/access/heap/heapam_xlog.c   | 59 ++++++++++++++++++++-
 src/backend/access/heap/visibilitymap.c | 70 ++++++++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c  |  5 ++
 src/include/access/visibilitymap.h      |  3 ++
 5 files changed, 163 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ed0c0c2dc9f..7f354caec31 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
 
 		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+		{
 			all_frozen_set = true;
+			/* Lock the vmbuffer before entering the critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		}
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
@@ -2506,7 +2510,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		 * going to add further frozen rows to it.
 		 *
 		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
+		 * page, mark it as all-frozen and update the visibility map. We're
+		 * already holding a pin on the vmbuffer.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2517,7 +2522,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
 		else if (all_frozen_set)
+		{
 			PageSetAllVisible(page);
+			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 RelationGetRelationName(relation));
+		}
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2637,26 +2658,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+			{
+				Assert(BufferIsDirty(vmbuffer));
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
+			}
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
-		 */
 		if (all_frozen_set)
-		{
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..73aaaef9d8e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -662,6 +663,62 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Now read and update the VM block.
+	 *
+	 * We must redo changes to the VM even if the heap page was skipped due to
+	 * LSN interlock. Each block of the VM contains bits for multiple heap
+	 * blocks and subsequent records may contain updates to other bits in this
+	 * block. If this record contains an FPI, subsequent records may rely on
+	 * it for protection against a torn page.
+	 *
+	 * The changes to the heap page are replayed first to maintain the
+	 * invariant that PD_ALL_VISIBLE must be set if the VM is set.
+	 *
+	 * Note that the lock on the heap page was dropped above. In normal
+	 * operation this would never be safe because a concurrent query could
+	 * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+	 * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+	 * the VM is set.
+	 *
+	 * In recovery, we expect no other writers, so writing to the VM page
+	 * without holding a lock on the heap page is considered safe enough. It
+	 * is done this way when replaying xl_heap_visible records (see
+	 * heap_xlog_visible()).
+	 */
+	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		char	   *relname;
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		/* We don't have relation name during recovery, so use relfilenode */
+		relname = psprintf("%u", rlocator.relNumber);
+		visibilitymap_set_vmbits(blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN,
+								 relname);
+
+		/*
+		 * It is not possible that the VM was already set for this heap page,
+		 * so the vmbuffer must have been modified and marked dirty.
+		 */
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		pfree(relname);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..b28460392b7 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
+ *		visibilitymap_set_vmbits - set bit(s) in a pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -321,6 +322,73 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * heapRelname is used only for debugging purposes.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbits(BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags,
+						 const char *heapRelname)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+		 flags, heapRelname, heapBlk);
+#endif
+
+	/* Call in same critical section where WAL is emitted. */
+	Assert(InRecovery || CritSectionCount > 0);
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	Assert(BufferIsExclusiveLocked(vmBuf));
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..3dcf37ba03f 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,9 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags,
+									  const char *heapRelname);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v15-0006-Update-PruneState.all_-visible-frozen-sooner-in-.patch (7.3K, 7-v15-0006-Update-PruneState.all_-visible-frozen-sooner-in-.patch)
  download | inline diff:
From 3221391ad5b194084715fcbac076948cf79dfcc9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 16:25:44 -0400
Subject: [PATCH v15 06/23] Update PruneState.all_[visible|frozen] sooner in
 pruning

We don't clear PruneState.all_visible and all_frozen during pruning when
we see LP_DEAD items because we want to still opportunistically freeze a
page if it would become frozen after vacuum's third phase.

Currently, this is fine because heap_page_prune_and_freeze() doesn't set
PD_ALL_VISIBLE or set bits in the VM. If we want to do that in the
future, we need all_visible and all_frozen to be accurate earlier in
heap_page_prune_and_freeze(). To do this, we must also move up
determination of the freeze conflict horizon. We use the visibility
cutoff xid even if the whole page won't be frozen until after vacuum's
third phase.
---
 src/backend/access/heap/pruneheap.c | 95 ++++++++++++++---------------
 1 file changed, 45 insertions(+), 50 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4ed74de6f27..5e536bd0d4d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -296,7 +296,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * pre-freeze checks.
  *
  * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
- * been decided before calling this function.
+ * been decided before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to freeze
+ * tuples.
  *
  * prstate is an input/output parameter.
  *
@@ -308,7 +310,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 					  bool did_tuple_hint_fpi,
 					  bool do_prune,
 					  bool do_hint_prune,
-					  PruneState *prstate)
+					  PruneState *prstate,
+					  TransactionId *frz_conflict_horizon)
 {
 	bool		do_freeze = false;
 
@@ -378,6 +381,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * critical section.
 		 */
 		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+		/*
+		 * Calculate what the snapshot conflict horizon should be for a record
+		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+		 * for conflicts when the whole page is eligible to become all-frozen
+		 * in the VM once we're done with it.  Otherwise we generate a
+		 * conservative cutoff by stepping back from OldestXmin.
+		 */
+		if (prstate->all_frozen)
+			*frz_conflict_horizon = prstate->visibility_cutoff_xid;
+		else
+		{
+			/* Avoids false conflicts when hot_standby_feedback in use */
+			*frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+			TransactionIdRetreat(*frz_conflict_horizon);
+		}
 	}
 	else if (prstate->nfrozen > 0)
 	{
@@ -478,6 +497,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_hint_prune;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId frz_conflict_horizon = InvalidTransactionId;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
@@ -546,10 +566,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * are tuples present that are not visible to everyone or if there are
 	 * dead tuples which are not yet removable.  However, dead tuples which
 	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * opportunistically freezing.  Because of that, we do not immediately
+	 * clear all_visible when we see LP_DEAD items.  We fix that after
+	 * scanning the line pointers, before we return the value to the caller,
+	 * so that the caller doesn't set the VM bit incorrectly.
 	 */
 	if (prstate.attempt_freeze)
 	{
@@ -784,8 +804,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 									  did_tuple_hint_fpi,
 									  do_prune,
 									  do_hint_prune,
-									  &prstate);
+									  &prstate,
+									  &frz_conflict_horizon);
 
+	/*
+	 * While scanning the line pointers, we did not clear
+	 * all_visible/all_frozen when encountering LP_DEAD items because we
+	 * wanted the decision whether or not to freeze the page to be unaffected
+	 * by the short-term presence of LP_DEAD items.  These LP_DEAD items are
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that we finished determining whether or not to freeze the page,
+	 * update all_visible and all_frozen so that they reflect the true state
+	 * of the page for setting PD_ALL_VISIBLE and VM bits.
+	 */
+	if (prstate.lpdead_items > 0)
+		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
@@ -846,27 +882,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
 			TransactionId conflict_xid;
 
-			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
-			 */
-			if (do_freeze)
-			{
-				if (prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
-			}
-
 			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
 				conflict_xid = frz_conflict_horizon;
 			else
@@ -890,30 +907,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
+	presult->all_visible = prstate.all_visible;
+	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
 	/*
-- 
2.43.0



  [text/x-patch] v15-0008-Combine-vacuum-phase-I-VM-update-cases.patch (4.4K, 8-v15-0008-Combine-vacuum-phase-I-VM-update-cases.patch)
  download | inline diff:
From 78d8fd0ab8ef94c9de7f3a4c8f308ce0c2cba54b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 17:48:38 -0400
Subject: [PATCH v15 08/23] Combine vacuum phase I VM update cases

We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.

Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.
---
 src/backend/access/heap/vacuumlazy.c | 68 ++++++++--------------------
 1 file changed, 18 insertions(+), 50 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 308abff16ca..5a6bbbd97f2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2058,15 +2058,22 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_frozen || presult.all_visible);
 
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
+	 * Handle setting visibility map bits based on information from the VM (as
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * all_frozen variables.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.all_visible && !all_visible_according_to_vm) ||
+		(presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer)))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
+		/*
+		 * If the page is all-frozen, we can pass InvalidTransactionId as our
+		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+		 * everything safe for REDO was logged when the page's tuples were
+		 * frozen.
+		 */
 		if (presult.all_frozen)
 		{
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2079,6 +2086,12 @@ lazy_scan_prune(LVRelState *vacrel,
 									   flags);
 
 		/*
+		 * Even if we are only setting the all-frozen bit, there is a small
+		 * chance that the VM was modified sometime between setting
+		 * all_visible_according_to_vm and checking the visibility during
+		 * pruning. Check the return value of old_vmbits to ensure the
+		 * visibility map counters used for logging are accurate.
+		 *
 		 * If the page wasn't already set all-visible and/or all-frozen in the
 		 * VM, count it as newly set for logging.
 		 */
@@ -2100,6 +2113,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/*
+	 * Now handle two potential corruption cases:
+	 *
 	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
 	 * page-level bit is clear.  However, it's possible that the bit got
 	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
@@ -2144,53 +2159,6 @@ lazy_scan_prune(LVRelState *vacrel,
 							VISIBILITYMAP_VALID_BITS);
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
 
 	return presult.ndeleted;
 }
-- 
2.43.0



  [text/x-patch] v15-0009-Vacuum-phase-III-set-PD_ALL_VISIBLE-in-vacuum-WA.patch (9.2K, 9-v15-0009-Vacuum-phase-III-set-PD_ALL_VISIBLE-in-vacuum-WA.patch)
  download | inline diff:
From e5fba63482e7d3bd44a991773ac3da50d2402781 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 10:39:31 -0400
Subject: [PATCH v15 09/23] Vacuum phase III set PD_ALL_VISIBLE in vacuum WAL
 record

Instead of setting PD_ALL_VISIBLE on the heap page when setting bits in
the VM, set it when flipping the line pointers on the page to LP_UNUSED.
This will allow us to omit the heap page from the VM WAL chain.

To do this, we must check if the page will be all-visible once we flip
the line pointers before we actually do so.

One functional change is that a single critical section surrounds both
the VM update and the heap update. Previously they were each in a
critical section, so we could crash and have set PD_ALL_VISIBLE but not
set bits in the VM.
---
 src/backend/access/heap/vacuumlazy.c | 140 ++++++++++++++++++++-------
 1 file changed, 105 insertions(+), 35 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5a6bbbd97f2..9bfcd67a61b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,6 +465,11 @@ static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2793,6 +2798,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	TransactionId visibility_cutoff_xid;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2803,6 +2809,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	if (heap_page_would_be_all_visible(vacrel, buffer,
+									   deadoffsets, num_offsets,
+									   &all_frozen, &visibility_cutoff_xid))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2822,6 +2840,13 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	/*
+	 * The page will never have PD_ALL_VISIBLE already set, so if we are
+	 * setting the VM, we must set PD_ALL_VISIBLE as well.
+	 */
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+		PageSetAllVisible(page);
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2833,7 +2858,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
 								  InvalidTransactionId,
 								  false,	/* no cleanup lock required */
-								  false,
+								  (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
@@ -2842,36 +2867,26 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	}
 
 	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
+	 * Note that we don't end the critical section until after emitting the VM
+	 * record. This ensures both PD_ALL_VISIBLE and the VM bits are set or
+	 * unset in the event of a crash. While it is correct for PD_ALL_VISIBLE
+	 * to be set and the VM to be clear, we should do our best to keep these
+	 * in sync. This does mean that we will take a lock on the VM buffer
+	 * inside of a critical section, which is generally discouraged. There is
+	 * precedent for this in other callers of visibilitymap_set(), though.
 	 */
-	END_CRIT_SECTION();
 
 	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
+	 * Now that we have removed the LP_DEAD items from the page, set the
+	 * visibility map if the page became all-visible/all-frozen. Changes to
+	 * the heap page have already been logged.
 	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		PageSetAllVisible(page);
 		visibilitymap_set(vacrel->rel, blkno, buffer,
 						  InvalidXLogRecPtr,
 						  vmbuffer, visibility_cutoff_xid,
-						  flags);
+						  vmflags);
 
 		/* Count the newly set VM page for logging */
 		vacrel->vm_new_visible_pages++;
@@ -2879,6 +2894,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 			vacrel->vm_new_visible_frozen_pages++;
 	}
 
+	END_CRIT_SECTION();
+
 	/* Revert to the previous phase information for error traceback */
 	restore_vacuum_error_info(vacrel, &saved_err_info);
 }
@@ -3540,30 +3557,77 @@ dead_items_cleanup(LVRelState *vacrel)
 }
 
 /*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
- *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
  */
 static bool
 heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 						 TransactionId *visibility_cutoff_xid,
 						 bool *all_frozen)
 {
+
+	return heap_page_would_be_all_visible(vacrel, buf,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid);
+}
+
+/*
+ * Determines whether or not the heap page in buf is all-visible other than
+ * the dead line pointers referred to by the provided deadoffsets array.
+ *
+ * deadoffsets are the offsets the caller knows about and already removed
+ * associated index entries. Vacuum will call this before setting those line
+ * pointers LP_UNUSED. So, if there are no new LP_DEAD items, then the page
+ * can be set all-visible in the VM by the caller.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
+ *
+ * vacrel->cutoffs.OldestXmin is used to determine visibility.
+ *
+ * *all_frozen is an output parameter indicating to the caller if every tuple
+ * on the page is frozen.
+ *
+ * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
+ * visible tuples. It is only valid if the page is all-visible.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
+ */
+static bool
+heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+							   OffsetNumber *deadoffsets,
+							   int ndeadoffsets,
+							   bool *all_frozen,
+							   TransactionId *visibility_cutoff_xid)
+{
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	int			matched_dead_count = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
 
+	Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+	/* Confirm input deadoffsets[] is strictly sorted */
+	if (ndeadoffsets > 1)
+	{
+		for (int i = 1; i < ndeadoffsets; i++)
+			Assert(deadoffsets[i - 1] < deadoffsets[i]);
+	}
+#endif
+
 	maxoff = PageGetMaxOffsetNumber(page);
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff && all_visible;
@@ -3591,9 +3655,15 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			if (!deadoffsets ||
+				matched_dead_count >= ndeadoffsets ||
+				deadoffsets[matched_dead_count] != offnum)
+			{
+				*all_frozen = all_visible = false;
+				break;
+			}
+			matched_dead_count++;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
-- 
2.43.0



  [text/x-patch] v15-0007-Set-PD_ALL_VISIBLE-in-heap_page_prune_and_freeze.patch (16.2K, 10-v15-0007-Set-PD_ALL_VISIBLE-in-heap_page_prune_and_freeze.patch)
  download | inline diff:
From 7554e6c7b6e9d1570e473b5c096eee84aab4f5db Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 16:32:35 -0400
Subject: [PATCH v15 07/23] Set PD_ALL_VISIBLE in heap_page_prune_and_freeze

After phase I of vacuum, if the heap page was rendered all-visible, we
can set it as such in the VM. We also must set the page-level
PD_ALL_VISIBLE bit. By setting PD_ALL_VISIBLE while making the other
changes to the heap page instead of while updating the VM, we can omit
the heap page from the WAL chain during the VM update. The result is
that XLOG_HEAP2_PRUNE_VACUUM_SCAN records include updates to
PD_ALL_VISIBLE.

This commit doesn't yet remove the heap page from the WAL chain because
it does not change other users of visibilitymap_set().

On-access pruning does not enable setting PD_ALL_VISIBLE.

Note that this is carefully coded such that if the only modification to
the page during heap_page_prune_and_freeze() is setting PD_ALL_VISIBLE
and checksums/wal_log_hints are disabled we will never emit a full page
image of the heap page.

This also fixes a longstanding issue where, when checksums/wal_log_hints
are enabled, an all-visible page being set all-frozen may not mark the
buffer dirty before visibilitymap_set() stamps it with the
xl_heap_visible LSN.

It is noteworthy that the checks for page corruption and an inconsistent
state between the heap page and the VM in lazy_scan_prune() now happen
after having set PD_ALL_VISIBLE. That is not a functional change because
the corruption cases are mutually exclusive with cases where we would
set PD_ALL_VISIBLE.
---
 src/backend/access/heap/heapam_xlog.c | 63 +++++++++++++++++++----
 src/backend/access/heap/pruneheap.c   | 72 ++++++++++++++++++++++++---
 src/backend/access/heap/vacuumlazy.c  | 29 +----------
 src/include/access/heapam.h           |  2 +
 src/include/access/heapam_xlog.h      |  2 +
 5 files changed, 125 insertions(+), 43 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 73aaaef9d8e..4ea1a186c98 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -90,6 +90,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
+		bool		do_prune;
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
 											   &nplans, &plans, &frz_offsets,
@@ -97,11 +98,13 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,17 +141,52 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		/*
+		 * The critical integrity requirement here is that we must never end
+		 * up with a situation where the visibility map bit is set, and the
+		 * page-level PD_ALL_VISIBLE bit is clear.  If that were to occur,
+		 * then a subsequent page modification would fail to clear the
+		 * visibility map bit.
+		 */
+		if (xlrec.flags & XLHP_SET_PD_ALL_VIS)
+			PageSetAllVisible(page);
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
-
-		PageSetLSN(page, lsn);
 		MarkBufferDirty(buffer);
+
+		/*
+		 * We always emit a WAL record when setting PD_ALL_VISIBLE, but we are
+		 * careful not to emit a full page image unless
+		 * checksums/wal_log_hints are enabled. We only set the heap page LSN
+		 * if full page images were an option when emitting WAL. Otherwise,
+		 * subsequent modifications of the page may incorrectly skip emitting
+		 * a full page image.
+		 */
+		if (do_prune || nplans > 0 ||
+			(xlrec.flags & XLHP_SET_PD_ALL_VIS && XLogHintBitIsNeeded()))
+			PageSetLSN(page, lsn);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or set PD_ALL_VISIBLE update
+	 * the freespace map.
+	 *
+	 * Even if we are just setting PD_ALL_VISIBLE (and thus not freeing up any
+	 * space), we'll still update the FSM for this page. Since the FSM is not
+	 * WAL-logged and only updated heuristically, it easily becomes stale in
+	 * standbys.  If the standby is later promoted and runs VACUUM, it will
+	 * skip updating individual free space figures for pages that became
+	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
+	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+	 * space values to upper FSM layers; later inserters try to use such pages
+	 * only to find out that they are unusable.  This can cause long stalls
+	 * when there are many such pages.
+	 *
+	 * Forestall those problems by updating FSM's idea about a page that is
+	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
@@ -157,10 +195,16 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	{
 		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
 						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
+						   XLHP_HAS_NOW_UNUSED_ITEMS |
+						   XLHP_SET_PD_ALL_VIS))
 		{
 			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
 
+			/*
+			 * We want to avoid holding an exclusive lock on the heap buffer
+			 * while doing IO, so we'll release the lock on the heap buffer
+			 * first.
+			 */
 			UnlockReleaseBuffer(buffer);
 
 			XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
@@ -173,10 +217,11 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 /*
  * Replay XLOG_HEAP2_VISIBLE records.
  *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
+ * It is imperative that the previously emitted record set PD_ALL_VISIBLE on
+ * the heap page. We must never end up with a situation where the visibility
+ * map bit is set, and the page-level PD_ALL_VISIBLE bit is clear.  If that
+ * were to occur, then a subsequent page modification would fail to clear the
+ * visibility map bit.
  */
 static void
 heap_xlog_visible(XLogReaderState *record)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5e536bd0d4d..9b25131543b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -495,6 +495,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	TransactionId frz_conflict_horizon = InvalidTransactionId;
@@ -824,6 +825,22 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+
+	/*
+	 * Though callers should set the VM if PD_ALL_VISIBLE is set here, it is
+	 * allowed for the page-level bit to be set and the VM to be clear.
+	 * Setting PD_ALL_VISIBLE when we are making the changes to the page that
+	 * render it all-visible allows us to omit the heap page from the WAL
+	 * chain when later updating the VM -- even when checksums/wal_log_hints
+	 * are enabled.
+	 */
+	do_set_pd_vis = false;
+	if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
+	{
+		if (prstate.all_visible && !PageIsAllVisible(page))
+			do_set_pd_vis = true;
+	}
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -844,14 +861,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_pd_vis)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_pd_vis)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -865,6 +885,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
+		if (do_set_pd_vis)
+			PageSetAllVisible(page);
+
 		MarkBufferDirty(buffer);
 
 		/*
@@ -891,7 +914,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 			log_heap_prune_and_freeze(relation, buffer,
 									  conflict_xid,
-									  true, reason,
+									  true,
+									  do_set_pd_vis,
+									  reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -2078,6 +2103,10 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
@@ -2086,6 +2115,7 @@ void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
+						  bool set_pd_all_vis,
 						  PruneReason reason,
 						  HeapTupleFreeze *frozen, int nfrozen,
 						  OffsetNumber *redirected, int nredirected,
@@ -2095,6 +2125,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2103,8 +2134,21 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
 	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 * Note that if we explicitly skip an FPI, we must not set the heap page
+	 * LSN later.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2112,7 +2156,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2169,6 +2213,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Prepare the main xl_heap_prune record.  We already set the XLHP_HAS_*
 	 * flag above.
 	 */
+	if (set_pd_all_vis)
+		xlrec.flags |= XLHP_SET_PD_ALL_VIS;
 	if (RelationIsAccessibleInLogicalDecoding(relation))
 		xlrec.flags |= XLHP_IS_CATALOG_REL;
 	if (TransactionIdIsValid(conflict_xid))
@@ -2201,5 +2247,17 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	/*
+	 * We must bump the page LSN if pruning or freezing. If we are only
+	 * updating PD_ALL_VISIBLE, though, we can skip doing this unless
+	 * wal_log_hints/checksums are enabled. Torn pages are possible if we
+	 * update PD_ALL_VISIBLE without bumping the LSN, but this is deemed okay
+	 * for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+	{
+		Assert(BufferIsDirty(buffer));
+		PageSetLSN(BufferGetPage(buffer), recptr);
+	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 50cc898087f..308abff16ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1970,7 +1970,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS;
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
@@ -2073,21 +2073,6 @@ lazy_scan_prune(LVRelState *vacrel,
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
@@ -2168,17 +2153,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		uint8		old_vmbits;
 
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
 		/*
 		 * Set the page all-frozen (and all-visible) in the VM.
 		 *
@@ -2891,6 +2865,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
 								  InvalidTransactionId,
 								  false,	/* no cleanup lock required */
+								  false,
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 665e0c79baf..34fe5603512 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
@@ -384,6 +385,7 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
+									  bool set_pd_all_vis,
 									  PruneReason reason,
 									  HeapTupleFreeze *frozen, int nfrozen,
 									  OffsetNumber *redirected, int nredirected,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..7d3fb75dda7 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -294,6 +294,8 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
 
+#define		XLHP_SET_PD_ALL_VIS			(1 << 0)
+
 /* to handle recovery conflict during logical decoding on standby */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
-- 
2.43.0



  [text/x-patch] v15-0010-Log-setting-empty-pages-PD_ALL_VISIBLE-with-XLOG.patch (3.0K, 11-v15-0010-Log-setting-empty-pages-PD_ALL_VISIBLE-with-XLOG.patch)
  download | inline diff:
From 112df3de663b3cee2a4e1b6c267bc880a2d39c6c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 18:11:49 -0400
Subject: [PATCH v15 10/23] Log setting empty pages PD_ALL_VISIBLE with
 XLOG_HEAP2_VACUUM_SCAN

Though not a big win for this particular case, if we use the
XLOG_HEAP2_VACUUM_SCAN record to log setting PD_ALL_VISIBLE on the heap
page we can omit the heap page from the WAL chain when setting the
visibility map. A follow-on commit will actually remove the heap page
from the VM set WAL chain.
---
 src/backend/access/heap/vacuumlazy.c | 43 +++++++++++++++++++---------
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9bfcd67a61b..c016f8f7c25 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1879,23 +1879,38 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		{
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
-			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				PageGetLSN(page) == InvalidXLogRecPtr)
-				log_newpage_buffer(buf, true);
+			if (RelationNeedsWAL(vacrel->rel))
+			{
+				/*
+				 * It's possible that another backend has extended the heap,
+				 * initialized the page, and then failed to WAL-log the page
+				 * due to an ERROR.  Since heap extension is not WAL-logged,
+				 * recovery might try to replay our record setting the page
+				 * all-visible and find that the page isn't initialized, which
+				 * will cause a PANIC. To prevent that, check whether the page
+				 * has been previously WAL-logged, and if not, do that now.
+				 *
+				 * Otherwise, just emit WAL for setting PD_ALL_VISIBLE on the
+				 * heap page. Doing this in a separate record from setting the
+				 * VM allows us to omit the heap page from the VM WAL chain.
+				 */
+				if (PageGetLSN(page) == InvalidXLogRecPtr)
+					log_newpage_buffer(buf, true);
+				else
+					log_heap_prune_and_freeze(vacrel->rel, buf,
+											  InvalidTransactionId, /* conflict xid */
+											  false,	/* cleanup lock */
+											  true, /* set_pd_all_vis */
+											  PRUNE_VACUUM_SCAN,	/* reason */
+											  NULL, 0,
+											  NULL, 0,
+											  NULL, 0,
+											  NULL, 0);
+			}
 
-			PageSetAllVisible(page);
 			visibilitymap_set(vacrel->rel, blkno, buf,
 							  InvalidXLogRecPtr,
 							  vmbuffer, InvalidTransactionId,
-- 
2.43.0



  [text/x-patch] v15-0011-Remove-heap-buffer-from-XLOG_HEAP2_VISIBLE-WAL-c.patch (11.8K, 12-v15-0011-Remove-heap-buffer-from-XLOG_HEAP2_VISIBLE-WAL-c.patch)
  download | inline diff:
From f14db744ecb79b121d8d0d3489384a93bd6abf07 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 11:05:30 -0400
Subject: [PATCH v15 11/23] Remove heap buffer from XLOG_HEAP2_VISIBLE WAL
 chain

Now that all users of visibilitymap_set() include setting PD_ALL_VISIBLE
in the WAL record capturing other changes to the heap page, we no longer
need to include the heap buffer in the WAL chain for setting the VM.
---
 src/backend/access/heap/heapam.c        | 16 +-----
 src/backend/access/heap/heapam_xlog.c   | 76 +++----------------------
 src/backend/access/heap/vacuumlazy.c    |  6 +-
 src/backend/access/heap/visibilitymap.c | 31 +---------
 src/include/access/heapam_xlog.h        |  3 +-
 src/include/access/visibilitymap.h      |  2 +-
 6 files changed, 16 insertions(+), 118 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7f354caec31..d4d83a6f9fe 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8806,21 +8806,14 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
  *
  * snapshotConflictHorizon comes from the largest xmin on the page being
  * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
  */
 XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer vm_buffer,
 				 TransactionId snapshotConflictHorizon, uint8 vmflags)
 {
 	xl_heap_visible xlrec;
 	XLogRecPtr	recptr;
-	uint8		flags;
 
-	Assert(BufferIsValid(heap_buffer));
 	Assert(BufferIsValid(vm_buffer));
 
 	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
@@ -8829,14 +8822,7 @@ log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
 		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
 	XLogBeginInsert();
 	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
 	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
 
 	return recptr;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 4ea1a186c98..2e9fda0a9bf 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -229,15 +229,12 @@ heap_xlog_visible(XLogReaderState *record)
 	XLogRecPtr	lsn = record->EndRecPtr;
 	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
 	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
 
 	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
 
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
+	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 
 	/*
 	 * If there are any Hot Standby transactions running that have an xmin
@@ -254,70 +251,11 @@ heap_xlog_visible(XLogReaderState *record)
 											rlocator);
 
 	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
+	 * Even if the heap relation was dropped or truncated and the previously
+	 * emitted record skipped the heap page update due to this LSN interlock,
+	 * it's still safe to update the visibility map.  Any WAL record that
+	 * clears the visibility map bit does so before checking the page LSN, so
+	 * any bits that need to be cleared will still be cleared.
 	 */
 	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
 									  &vmbuffer) == BLK_NEEDS_REDO)
@@ -341,7 +279,7 @@ heap_xlog_visible(XLogReaderState *record)
 
 		reln = CreateFakeRelcacheEntry(rlocator);
 
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
+		visibilitymap_set(reln, blkno, lsn, vmbuffer,
 						  xlrec->snapshotConflictHorizon, vmbits);
 
 		ReleaseBuffer(vmbuffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c016f8f7c25..735f1e7501e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1911,7 +1911,7 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 											  NULL, 0);
 			}
 
-			visibilitymap_set(vacrel->rel, blkno, buf,
+			visibilitymap_set(vacrel->rel, blkno,
 							  InvalidXLogRecPtr,
 							  vmbuffer, InvalidTransactionId,
 							  VISIBILITYMAP_ALL_VISIBLE |
@@ -2100,7 +2100,7 @@ lazy_scan_prune(LVRelState *vacrel,
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
+		old_vmbits = visibilitymap_set(vacrel->rel, blkno,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
 									   flags);
@@ -2898,7 +2898,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 */
 	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		visibilitymap_set(vacrel->rel, blkno, buffer,
+		visibilitymap_set(vacrel->rel, blkno,
 						  InvalidXLogRecPtr,
 						  vmbuffer, visibility_cutoff_xid,
 						  vmflags);
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index b28460392b7..33541e36674 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -233,9 +233,7 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
  * when a page that is already all-visible is being marked all-frozen.
  *
  * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
+ * this function.
  *
  * You must pass a buffer containing the correct map page to this function.
  * Call visibilitymap_pin first to pin the right one. This function doesn't do
@@ -244,7 +242,7 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
  * Returns the state of the page's VM bits before setting flags.
  */
 uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
 				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
 				  uint8 flags)
 {
@@ -261,18 +259,11 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 #endif
 
 	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
 	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
 
 	/* Must never set all_frozen bit without also setting all_visible bit */
 	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
 
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
 	/* Check that we have the right VM page pinned */
 	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
 		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
@@ -294,23 +285,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 			if (XLogRecPtrIsInvalid(recptr))
 			{
 				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
+				recptr = log_heap_visible(rel, vmBuf, cutoff_xid, flags);
 			}
 			PageSetLSN(page, recptr);
 		}
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 7d3fb75dda7..82b8f7f2bbc 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -440,7 +440,6 @@ typedef struct xl_heap_inplace
  * This is what we need to know about setting a visibility map bit
  *
  * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
  */
 typedef struct xl_heap_visible
 {
@@ -493,7 +492,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel,
 								   Buffer vm_buffer,
 								   TransactionId snapshotConflictHorizon,
 								   uint8 vmflags);
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 3dcf37ba03f..fbc69604d57 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -32,7 +32,7 @@ extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
 extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
+							   BlockNumber heapBlk,
 							   XLogRecPtr recptr,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
-- 
2.43.0



  [text/x-patch] v15-0012-Make-heap_page_is_all_visible-independent-of-LVR.patch (6.8K, 13-v15-0012-Make-heap_page_is_all_visible-independent-of-LVR.patch)
  download | inline diff:
From 0174b06e74adeedf425ac159cd04b11c9c35fd73 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 15:39:31 -0400
Subject: [PATCH v15 12/23] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 57 ++++++++++++++++++----------
 1 file changed, 37 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 735f1e7501e..a0f3984e37f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,13 +463,18 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static bool heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid);
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2030,8 +2035,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2824,9 +2830,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
-	if (heap_page_would_be_all_visible(vacrel, buffer,
+	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+									   vacrel->cutoffs.OldestXmin,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid))
+									   &all_frozen, &visibility_cutoff_xid,
+									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
@@ -3576,15 +3584,19 @@ dead_items_cleanup(LVRelState *vacrel)
  * callers that expect no LP_DEAD on the page.
  */
 static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(vacrel, buf,
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid);
+										  visibility_cutoff_xid,
+										  logging_offnum);
 }
 
 /*
@@ -3599,7 +3611,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * vacrel->cutoffs.OldestXmin is used to determine visibility.
+ * OldestXmin is used to determine visibility.
  *
  * *all_frozen is an output parameter indicating to the caller if every tuple
  * on the page is frozen.
@@ -3607,6 +3619,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
  * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
  * visible tuples. It is only valid if the page is all-visible.
  *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
  * Callers looking to verify that the page is already all-visible can call
  * heap_page_is_all_visible().
  *
@@ -3616,11 +3631,13 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
  * to avoid introducing new side-effects here.
  */
 static bool
-heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid)
+							   TransactionId *visibility_cutoff_xid,
+							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3655,7 +3672,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3685,9 +3702,9 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
 										 buf))
 		{
 			case HEAPTUPLE_LIVE:
@@ -3708,7 +3725,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+											   OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3743,7 +3760,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
-- 
2.43.0



  [text/x-patch] v15-0013-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (20.9K, 14-v15-0013-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From b6d38f5938f2614b89e76a372cf88f2a857216e5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 15:52:18 -0400
Subject: [PATCH v15 13/23] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase III

Instead of emitting a separate XLOG_HEAP2_VISIBLE record for each page
that is rendered all-visible by vacuum's third phase, include the
updates to the VM in the already emitted XLOG_HEAP2_PRUNE_VACUUM_CLEANUP
record.

The visibilitymap bits are stored in the flags member of the
xl_heap_prune struct.

This can decrease the number of of WAL records vacuum phase III emits by
as much as half.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c  | 134 ++++++++++++++++++-------
 src/backend/access/heap/pruneheap.c    |  37 ++++++-
 src/backend/access/heap/vacuumlazy.c   |  38 +++----
 src/backend/access/rmgrdesc/heapdesc.c |  11 +-
 src/include/access/heapam.h            |   1 +
 src/include/access/heapam_xlog.h       |  25 ++++-
 6 files changed, 177 insertions(+), 69 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2e9fda0a9bf..dcd0dba45a0 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
+	Size		freespace = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,22 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
+	if (xlrec.flags & XLHP_VM_ALL_VISIBLE)
+	{
+		vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (xlrec.flags & XLHP_VM_ALL_FROZEN)
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+	}
+
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. That means we
+	 * can't replay this record if it removes tuples that are still visible to
+	 * transactions on the standby, freeze tuples with xids that are still
+	 * considered running on the standby, or set a page as all-visible in the
+	 * VM if it isn't all-visible to all transactions on the standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	}
 
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -100,6 +113,11 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 
 		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 ||
+			   vmflags & VISIBILITYMAP_VALID_BITS ||
+			   xlrec.flags & XLHP_SET_PD_ALL_VIS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
@@ -147,15 +165,23 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		 * page-level PD_ALL_VISIBLE bit is clear.  If that were to occur,
 		 * then a subsequent page modification would fail to clear the
 		 * visibility map bit.
+		 *
+		 * Note: we don't worry about updating the page's prunability hints.
+		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
 		if (xlrec.flags & XLHP_SET_PD_ALL_VIS)
 			PageSetAllVisible(page);
 
 		/*
-		 * Note: we don't worry about updating the page's prunability hints.
-		 * At worst this will cause an extra prune cycle to occur soon.
+		 * We must never end up with the VM bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+		 * modification would fail to clear the VM bit.
 		 */
-		MarkBufferDirty(buffer);
+		Assert(!(vmflags & VISIBILITYMAP_VALID_BITS) || PageIsAllVisible(page));
+
+		/* If this record only sets the VM, no need to dirty the heap page */
+		if (do_prune || nplans > 0 || xlrec.flags & XLHP_SET_PD_ALL_VIS)
+			MarkBufferDirty(buffer);
 
 		/*
 		 * We always emit a WAL record when setting PD_ALL_VISIBLE, but we are
@@ -171,47 +197,81 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	}
 
 	/*
-	 * If we released any space or line pointers or set PD_ALL_VISIBLE update
-	 * the freespace map.
+	 * If we released any space or line pointers or set PD_ALL_VISIBLE or the
+	 * VM, update the freespace map.
 	 *
-	 * Even if we are just setting PD_ALL_VISIBLE (and thus not freeing up any
-	 * space), we'll still update the FSM for this page. Since the FSM is not
-	 * WAL-logged and only updated heuristically, it easily becomes stale in
-	 * standbys.  If the standby is later promoted and runs VACUUM, it will
-	 * skip updating individual free space figures for pages that became
-	 * all-visible (or all-frozen, depending on the vacuum mode,) which is
-	 * troublesome when FreeSpaceMapVacuum propagates too optimistic free
-	 * space values to upper FSM layers; later inserters try to use such pages
-	 * only to find out that they are unusable.  This can cause long stalls
-	 * when there are many such pages.
+	 * Even if we are just setting PD_ALL_VISIBLE or updating the VM (and thus
+	 * not freeing up any space), we'll still update the FSM for this page.
+	 * Since the FSM is not WAL-logged and only updated heuristically, it
+	 * easily becomes stale in standbys.  If the standby is later promoted and
+	 * runs VACUUM, it will skip updating individual free space figures for
+	 * pages that became all-visible (or all-frozen, depending on the vacuum
+	 * mode,) which is troublesome when FreeSpaceMapVacuum propagates too
+	 * optimistic free space values to upper FSM layers; later inserters try
+	 * to use such pages only to find out that they are unusable.  This can
+	 * cause long stalls when there are many such pages.
 	 *
 	 * Forestall those problems by updating FSM's idea about a page that is
 	 * becoming all-visible or all-frozen.
 	 *
 	 * Do this regardless of a full-page image being applied, since the FSM
 	 * data is not in the page anyway.
+	 *
+	 * We want to avoid holding an exclusive lock on the heap buffer while
+	 * doing IO (either of the FSM or the VM), so we'll release the lock on
+	 * the heap buffer before doing either.
 	 */
 	if (BufferIsValid(buffer))
 	{
 		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
 						   XLHP_HAS_DEAD_ITEMS |
 						   XLHP_HAS_NOW_UNUSED_ITEMS |
-						   XLHP_SET_PD_ALL_VIS))
-		{
-			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+						   XLHP_SET_PD_ALL_VIS |
+						   (vmflags & VISIBILITYMAP_VALID_BITS)))
+			freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
 
-			/*
-			 * We want to avoid holding an exclusive lock on the heap buffer
-			 * while doing IO, so we'll release the lock on the heap buffer
-			 * first.
-			 */
-			UnlockReleaseBuffer(buffer);
+		UnlockReleaseBuffer(buffer);
+	}
+
+	/*
+	 * Now read and update the VM block.
+	 *
+	 * We must redo changes to the VM even if the heap page was skipped due to
+	 * LSN interlock. See comment in heap_xlog_multi_insert() for more details
+	 * on replaying changes to the VM.
+	 */
+	if (vmflags & VISIBILITYMAP_VALID_BITS &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		char	   *relname;
+		uint8		old_vmbits = 0;
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
 
-			XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+		/* We don't have relation name during recovery, so use relfilenode */
+		relname = psprintf("%u", rlocator.relNumber);
+		old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
+
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+		{
+			Assert(BufferIsDirty(vmbuffer));
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		}
-		else
-			UnlockReleaseBuffer(buffer);
+		pfree(relname);
 	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
+	if (freespace > 0)
+		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9b25131543b..9e00fbf3cd1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -20,6 +20,7 @@
 #include "access/multixact.h"
 #include "access/transam.h"
 #include "access/xlog.h"
+#include "access/visibilitymapdefs.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
 #include "executor/instrument.h"
@@ -913,6 +914,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  InvalidBuffer, 0,
 									  conflict_xid,
 									  true,
 									  do_set_pd_vis,
@@ -2088,14 +2090,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phase III, the heap page may be marked
+ *   all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2103,6 +2109,10 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
  * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
  * the page LSN when checksums/wal_log_hints are enabled even if we did not
  * prune or freeze tuples on the page.
@@ -2113,6 +2123,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer, uint8 vmflags,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
 						  bool set_pd_all_vis,
@@ -2139,6 +2150,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlrec.flags = 0;
 	regbuf_flags = REGBUF_STANDARD;
 
+	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+
 	/*
 	 * We can avoid an FPI if the only modification we are making to the heap
 	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
@@ -2157,6 +2170,10 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 */
 	XLogBeginInsert();
 	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2213,6 +2230,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Prepare the main xl_heap_prune record.  We already set the XLHP_HAS_*
 	 * flag above.
 	 */
+	if (vmflags & VISIBILITYMAP_ALL_VISIBLE)
+	{
+		xlrec.flags |= XLHP_VM_ALL_VISIBLE;
+		if (vmflags & VISIBILITYMAP_ALL_FROZEN)
+			xlrec.flags |= XLHP_VM_ALL_FROZEN;
+	}
 	if (set_pd_all_vis)
 		xlrec.flags |= XLHP_SET_PD_ALL_VIS;
 	if (RelationIsAccessibleInLogicalDecoding(relation))
@@ -2247,6 +2270,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+	{
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+	}
+
 	/*
 	 * We must bump the page LSN if pruning or freezing. If we are only
 	 * updating PD_ALL_VISIBLE, though, we can skip doing this unless
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a0f3984e37f..539e5267574 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1906,6 +1906,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 					log_newpage_buffer(buf, true);
 				else
 					log_heap_prune_and_freeze(vacrel->rel, buf,
+											  InvalidBuffer,
+											  0,
 											  InvalidTransactionId, /* conflict xid */
 											  false,	/* cleanup lock */
 											  true, /* set_pd_all_vis */
@@ -2817,6 +2819,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
 	uint8		vmflags = 0;
@@ -2842,6 +2845,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
 			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 		}
+
+		/* Take the lock on the vmbuffer before entering a critical section */
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 	}
 
 	START_CRIT_SECTION();
@@ -2868,7 +2874,13 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * setting the VM, we must set PD_ALL_VISIBLE as well.
 	 */
 	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
 		PageSetAllVisible(page);
+		visibilitymap_set_vmbits(blkno,
+								 vmbuffer, vmflags,
+								 RelationGetRelationName(vacrel->rel));
+		conflict_xid = visibility_cutoff_xid;
+	}
 
 	/*
 	 * Mark buffer dirty before we write WAL.
@@ -2879,7 +2891,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  vmbuffer, vmflags,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
 								  (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
 								  PRUNE_VACUUM_CLEANUP,
@@ -2889,36 +2902,17 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * Note that we don't end the critical section until after emitting the VM
-	 * record. This ensures both PD_ALL_VISIBLE and the VM bits are set or
-	 * unset in the event of a crash. While it is correct for PD_ALL_VISIBLE
-	 * to be set and the VM to be clear, we should do our best to keep these
-	 * in sync. This does mean that we will take a lock on the VM buffer
-	 * inside of a critical section, which is generally discouraged. There is
-	 * precedent for this in other callers of visibilitymap_set(), though.
-	 */
+	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, set the
-	 * visibility map if the page became all-visible/all-frozen. Changes to
-	 * the heap page have already been logged.
-	 */
 	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		visibilitymap_set(vacrel->rel, blkno,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  vmflags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
 	}
 
-	END_CRIT_SECTION();
-
 	/* Revert to the previous phase information for error traceback */
 	restore_vacuum_error_info(vacrel, &saved_err_info);
 }
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..1cb44ca32d3 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
  * code, the latter of which is used in frontend (pg_waldump) code.
  */
 void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 									   int *nplans, xlhp_freeze_plan **plans,
 									   OffsetNumber **frz_offsets,
 									   int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & XLHP_VM_ALL_VISIBLE)
+		{
+			uint8		vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			if (xlrec->flags & XLHP_VM_ALL_FROZEN)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+		}
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 34fe5603512..e97b53f1ee8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -383,6 +383,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer, uint8 vmflags,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
 									  bool set_pd_all_vis,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 82b8f7f2bbc..833114e0a6e 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
  * Main data section:
  *
  *	xl_heap_prune
- *		uint8				flags
+ *		uint16				flags
  *	TransactionId			snapshot_conflict_horizon
  *
  * Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
  */
 typedef struct xl_heap_prune
 {
-	uint8		flags;
+	uint16		flags;
 
 	/*
 	 * If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,11 +292,17 @@ typedef struct xl_heap_prune
 	 */
 } xl_heap_prune;
 
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
 #define		XLHP_SET_PD_ALL_VIS			(1 << 0)
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -332,6 +338,15 @@ typedef struct xl_heap_prune
 #define		XLHP_HAS_DEAD_ITEMS	        (1 << 6)
 #define		XLHP_HAS_NOW_UNUSED_ITEMS   (1 << 7)
 
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set.
+ * xl_heap_prune should always use the XLHP_VM_ALL_VISIBLE and
+ * XLHP_VM_ALL_FROZEN flags and translate them to their visibilitymapdefs.h
+ * equivalents, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN.
+ */
+#define		XLHP_VM_ALL_VISIBLE			(1 << 8)
+#define		XLHP_VM_ALL_FROZEN			(1 << 9)
+
 /*
  * xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
  * (appears in xl_heap_prune's xlhp_freeze_plans sub-record)
@@ -498,7 +513,7 @@ extern XLogRecPtr log_heap_visible(Relation rel,
 								   uint8 vmflags);
 
 /* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
 												   OffsetNumber **frz_offsets,
 												   int *nredirected, OffsetNumber **redirected,
-- 
2.43.0



  [text/x-patch] v15-0014-Set-empty-pages-all-visible-in-XLOG_HEAP2_PRUNE_.patch (3.3K, 15-v15-0014-Set-empty-pages-all-visible-in-XLOG_HEAP2_PRUNE_.patch)
  download | inline diff:
From b0242b98434d61bcaff239a5731f3d1e65f310f6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 16:04:18 -0400
Subject: [PATCH v15 14/23] Set empty pages all-visible in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN record

As part of a project to eliminate XLOG_HEAP2_VISIBLE records, eliminate
their usage in phase I vacuum of empty pages.
---
 src/backend/access/heap/vacuumlazy.c | 56 +++++++++++++++++-----------
 1 file changed, 35 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 539e5267574..e8721761392 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1882,11 +1882,22 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			bool		set_pd_all_vis = true;
+
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
 			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 RelationGetRelationName(vacrel->rel));
+
 			if (RelationNeedsWAL(vacrel->rel))
 			{
 				/*
@@ -1897,34 +1908,37 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				 * all-visible and find that the page isn't initialized, which
 				 * will cause a PANIC. To prevent that, check whether the page
 				 * has been previously WAL-logged, and if not, do that now.
-				 *
-				 * Otherwise, just emit WAL for setting PD_ALL_VISIBLE on the
-				 * heap page. Doing this in a separate record from setting the
-				 * VM allows us to omit the heap page from the VM WAL chain.
 				 */
 				if (PageGetLSN(page) == InvalidXLogRecPtr)
+				{
 					log_newpage_buffer(buf, true);
-				else
-					log_heap_prune_and_freeze(vacrel->rel, buf,
-											  InvalidBuffer,
-											  0,
-											  InvalidTransactionId, /* conflict xid */
-											  false,	/* cleanup lock */
-											  true, /* set_pd_all_vis */
-											  PRUNE_VACUUM_SCAN,	/* reason */
-											  NULL, 0,
-											  NULL, 0,
-											  NULL, 0,
-											  NULL, 0);
+					set_pd_all_vis = false;
+				}
+
+				/*
+				 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+				 * setting the VM. If we emitted a new page record for the
+				 * page above, setting PD_ALL_VISIBLE will already have been
+				 * included in that record.
+				 */
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  set_pd_all_vis,
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
 			}
 
-			visibilitymap_set(vacrel->rel, blkno,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v15-0016-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (12.7K, 16-v15-0016-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 558df69caf2c977989781da2757a4c930728e596 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 17:29:59 -0400
Subject: [PATCH v15 16/23] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the XLOG_HEAP2_PRUNE_VACUUM_SCAN record already emitted.

This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
 src/backend/access/heap/pruneheap.c | 184 +++++++++++++++++-----------
 src/include/access/heapam.h         |   3 +-
 2 files changed, 113 insertions(+), 74 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e3f9967e26c..473822a8e26 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -662,50 +662,58 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Keep track of whether or not the page will be all-visible and
+	 * all-frozen for use in opportunistic freezing and to update the VM if
+	 * the caller requests it.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Currently, only VACUUM attempts freezing. But other callers could. The
+	 * visibility bookkeeping is required for opportunistic freezing (in
+	 * addition to setting the VM bits) because we only consider
+	 * opportunistically freezing tuples if the whole page would become
+	 * all-frozen or if the whole page will be frozen except for dead tuples
+	 * that will be removed by vacuum. But if consider_update_vm is false,
+	 * we'll not set the VM even if the page is discovered to be all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible when we see LP_DEAD items.  We fix that after
-	 * scanning the line pointers, before we return the value to the caller,
-	 * so that the caller doesn't set the VM bit incorrectly.
+	 * If only HEAP_PAGE_PRUNE_UPDATE_ViS is passed and not
+	 * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+	 * because we will not call heap_prepare_freeze_tuple() on each tuple.
+	 *
+	 * Dead tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing, so we do not clear
+	 * all_visible when we see LP_DEAD items. We fix that after determining
+	 * whether or not to freeze but before deciding whether or not to update
+	 * the VM so that we don't set the VM bit incorrectly.
+	 *
+	 * If not freezing and not updating the VM, we avoid the extra
+	 * bookkeeping. Initializing all_visible to false allows skipping the work
+	 * to update them in heap_prune_record_unchanged_lp_normal().
 	 */
 	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon.
+	 * This is most likely to happen when updating the VM and/or freezing all
+	 * live tuples on the page. It is updated before returning to the caller
+	 * because vacuum does assert-build only validation on the page using this
+	 * field.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -943,16 +951,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * visibility map bits based on information from the VM and from
 	 * all_visible and all_frozen variables.
 	 *
-	 * Though callers should set the VM if PD_ALL_VISIBLE is set, it is
-	 * allowed for the page-level bit to be set and the VM to be clear. We log
-	 * setting PD_ALL_VISIBLE on the heap page in a
-	 * XLOG_HEAP2_PRUNE_VACUUM_SCAN record and setting the VM bits in a later
-	 * emitted XLOG_HEAP2_VISIBLE record.
+	 * It is allowed for the page-level bit to be set and the VM to be clear,
+	 * however, we have a strong preference for keeping them in sync.
 	 *
-	 * Setting PD_ALL_VISIBLE when we are making the changes to the page that
-	 * render it all-visible allows us to omit the heap page from the WAL
-	 * chain when later updating the VM -- even when checksums/wal_log_hints
-	 * are enabled.
+	 * Prior to Postgres 19, it was possible for the page-level bit to be set
+	 * and the VM bit to be clear. This could happen if we crashed after
+	 * setting PD_ALL_VISIBLE but before setting bits in the VM.
+	 *
+	 * As such, it is possible to only update the VM when PD_ALL_VISIBLE is
+	 * already set.
 	 */
 	do_set_pd_vis = false;
 	do_set_vm = false;
@@ -961,6 +968,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 										   blockno, buffer, vmbuffer, blk_known_av,
 										   &prstate, &new_vmbits, &do_set_pd_vis);
 
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -991,7 +1002,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze || do_set_pd_vis)
+	if (do_prune || do_freeze || do_set_pd_vis || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1008,12 +1019,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		if (do_set_pd_vis)
 			PageSetAllVisible(page);
 
-		MarkBufferDirty(buffer);
+		if (do_prune || do_freeze || do_set_pd_vis)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			Assert(PageIsAllVisible(page));
+
+			old_vmbits = visibilitymap_set_vmbits(blockno,
+												  vmbuffer, new_vmbits,
+												  RelationGetRelationName(relation));
+			if (old_vmbits == new_vmbits)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				/* Unset so we don't emit WAL since no change occurred */
+				do_set_vm = false;
+			}
+		}
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
+		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did If we were
+		 * only updating the VM and it turns out it was already set, we will
+		 * have unset do_set_vm earlier. As such, check it again before
+		 * emitting the record.
 		 */
-		if (RelationNeedsWAL(relation))
+		if (RelationNeedsWAL(relation) &&
+			(do_prune || do_freeze || do_set_pd_vis || do_set_vm))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -1025,15 +1056,45 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId conflict_xid;
+			TransactionId conflict_xid = InvalidTransactionId;
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
+			/*
+			 * If we are updating the VM, the conflict horizon is almost
+			 * always the visibility cutoff XID.
+			 *
+			 * Separately, if we are freezing any tuples, as an optimization,
+			 * we can use the visibility_cutoff_xid as the conflict horizon if
+			 * the page will be all-frozen. This is true even if there are
+			 * LP_DEAD line pointers because we ignored those when maintaining
+			 * the visibility_cutoff_xid. This will have been calculated
+			 * earlier as the frz_conflict_horizon when we determined we would
+			 * freeze.
+			 */
+			if (do_set_vm)
+				conflict_xid = prstate.visibility_cutoff_xid;
+			else if (do_freeze)
 				conflict_xid = frz_conflict_horizon;
-			else
+
+			/*
+			 * If we are removing tuples with a younger xmax than our so far
+			 * calculated conflict_xid, we must use this as our horizon.
+			 */
+			if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
 				conflict_xid = prstate.latest_xid_removed;
 
+			/*
+			 * We can omit the snapshot conflict horizon if we are not pruning
+			 * or freezing any tuples and are setting an already all-visible
+			 * page all-frozen in the VM. In this case, all of the tuples on
+			 * the page must already be visible to all MVCC snapshots on the
+			 * standby.
+			 */
+			if (!do_prune && !do_freeze && do_set_vm &&
+				blk_known_av && (new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+				conflict_xid = InvalidTransactionId;
+
 			log_heap_prune_and_freeze(relation, buffer,
-									  InvalidBuffer, 0,
+									  vmbuffer, new_vmbits,
 									  conflict_xid,
 									  true,
 									  do_set_pd_vis,
@@ -1047,6 +1108,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
 	/*
@@ -1078,32 +1142,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 #endif
 
-	/* Now set the VM */
-	if (do_set_vm)
-	{
-		TransactionId vm_conflict_horizon;
-
-		Assert((new_vmbits & VISIBILITYMAP_VALID_BITS) != 0);
-
-		/*
-		 * The conflict horizon for that record must be the newest xmin on the
-		 * page.  However, if the page is completely frozen, there can be no
-		 * conflict and the vm_conflict_horizon should remain
-		 * InvalidTransactionId.  This includes the case that we just froze
-		 * all the tuples; the prune-freeze record included the conflict XID
-		 * already so a snapshotConflictHorizon sufficient to make everything
-		 * safe for REDO was logged when the page's tuples were frozen.
-		 */
-		if (prstate.all_frozen)
-			vm_conflict_horizon = InvalidTransactionId;
-		else
-			vm_conflict_horizon = prstate.visibility_cutoff_xid;
-		old_vmbits = visibilitymap_set(relation, blockno,
-									   InvalidXLogRecPtr,
-									   vmbuffer, vm_conflict_horizon,
-									   new_vmbits);
-	}
-
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
@@ -2261,7 +2299,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phase III, the heap page may be marked
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
  *   all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 493ddeacbc0..394f62a21e5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -239,7 +239,8 @@ typedef struct PruneFreezeResult
 	 * visibility map before updating it during phase I of vacuuming.
 	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+	 * we have actually updated the VM.
 	 */
 	uint8		new_vmbits;
 	uint8		old_vmbits;
-- 
2.43.0



  [text/x-patch] v15-0017-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (21.1K, 17-v15-0017-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 3d5897f742431854cfc9e5cc300a92ae256b3496 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 17:42:54 -0400
Subject: [PATCH v15 17/23] Remove XLOG_HEAP2_VISIBLE entirely

There are now no users of this, so eliminate it entirely.
This includes the xl_heap_visible struct as well as all of the functions
used to emit and replay XLOG_HEAP2_VISIBLE records.
---
 src/backend/access/common/bufmask.c      |  4 +-
 src/backend/access/heap/heapam.c         | 40 ++--------
 src/backend/access/heap/heapam_xlog.c    | 94 ++----------------------
 src/backend/access/heap/pruneheap.c      |  6 +-
 src/backend/access/heap/vacuumlazy.c     | 16 ++--
 src/backend/access/heap/visibilitymap.c  | 85 +--------------------
 src/backend/access/rmgrdesc/heapdesc.c   | 10 ---
 src/backend/replication/logical/decode.c |  1 -
 src/backend/storage/ipc/standby.c        | 12 +--
 src/include/access/heapam_xlog.h         | 19 -----
 src/include/access/visibilitymap.h       | 15 ++--
 src/include/access/visibilitymapdefs.h   |  9 ---
 src/tools/pgindent/typedefs.list         |  1 -
 13 files changed, 41 insertions(+), 271 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d4d83a6f9fe..14a2996b9ee 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 RelationGetRelationName(relation));
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  RelationGetRelationName(relation));
 		}
 
 		/*
@@ -8798,36 +8798,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-	XLogRegisterBuffer(0, vm_buffer, 0);
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index dcd0dba45a0..502517fa62e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -256,7 +256,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 
 		/* We don't have relation name during recovery, so use relfilenode */
 		relname = psprintf("%u", rlocator.relNumber);
-		old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
+		old_vmbits = visibilitymap_set(blkno, vmbuffer, vmflags, relname);
 
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
@@ -274,81 +274,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * It is imperative that the previously emitted record set PD_ALL_VISIBLE on
- * the heap page. We must never end up with a situation where the visibility
- * map bit is set, and the page-level PD_ALL_VISIBLE bit is clear.  If that
- * were to occur, then a subsequent page modification would fail to clear the
- * visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Even if the heap relation was dropped or truncated and the previously
-	 * emitted record skipped the heap page update due to this LSN interlock,
-	 * it's still safe to update the visibility map.  Any WAL record that
-	 * clears the visibility map bit does so before checking the page LSN, so
-	 * any bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -728,8 +653,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * In recovery, we expect no other writers, so writing to the VM page
 	 * without holding a lock on the heap page is considered safe enough. It
-	 * is done this way when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * is done this way when replaying xl_heap_prune records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -744,11 +669,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 		/* We don't have relation name during recovery, so use relfilenode */
 		relname = psprintf("%u", rlocator.relNumber);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 relname);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  relname);
 
 		/*
 		 * It is not possible that the VM was already set for this heap page,
@@ -1334,9 +1259,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 473822a8e26..faf1002f25f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1026,9 +1026,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		{
 			Assert(PageIsAllVisible(page));
 
-			old_vmbits = visibilitymap_set_vmbits(blockno,
-												  vmbuffer, new_vmbits,
-												  RelationGetRelationName(relation));
+			old_vmbits = visibilitymap_set(blockno,
+										   vmbuffer, new_vmbits,
+										   RelationGetRelationName(relation));
 			if (old_vmbits == new_vmbits)
 			{
 				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d71f3755dce..e59eb40133d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1887,11 +1887,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 RelationGetRelationName(vacrel->rel));
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  RelationGetRelationName(vacrel->rel));
 
 			if (RelationNeedsWAL(vacrel->rel))
 			{
@@ -2776,9 +2776,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
 	{
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 RelationGetRelationName(vacrel->rel));
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  RelationGetRelationName(vacrel->rel));
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 33541e36674..8754b737e94 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,82 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, vmBuf, cutoff_xid, flags);
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
 
 /*
  * Set flags in the VM block contained in the passed in vmBuf.
@@ -320,9 +243,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk,
  * is pinned and exclusive locked.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const char *heapRelname)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const char *heapRelname)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 1cb44ca32d3..93505cb8c56 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -460,9 +453,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 833114e0a6e..61ceaf2a98b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -451,19 +450,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -507,11 +493,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index fbc69604d57..859e5795457 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
@@ -31,15 +30,11 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const char *heapRelname);
+
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const char *heapRelname);
+
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3c80d49b67e..d400c8429b0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4269,7 +4269,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v15-0018-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (7.1K, 18-v15-0018-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From 7821b9fd5001f1e1e20ec0c4857655cc8b781cbc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v15 18/23] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()

Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 14 +++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 13 ++++++-------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index faf1002f25f..52e956189e8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -218,7 +218,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -727,9 +727,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1200,11 +1200,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..f67f01c17c2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v15-0015-Set-VM-in-heap_page_prune_and_freeze.patch (22.3K, 19-v15-0015-Set-VM-in-heap_page_prune_and_freeze.patch)
  download | inline diff:
From 75a2d24ed02733533027b9fe17f25160d2529b0c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 15:46:40 -0400
Subject: [PATCH v15 15/23] Set VM in heap_page_prune_and_freeze

The determination as to whether or not the page can be set
all-visible/all-frozen has already been done by the end of
heap_page_prune_and_freeze(). Vacuum waited until it returns to
lazy_scan_prune() to actually set the VM, though.

This commit moves setting the VM into heap_page_prune_and_freeze().
There are still two separate WAL records -- one for the changes to the
heap page and one for the changes to the VM. But, this is an incremental
step toward logging setting the VM in the same WAL record as pruning and
freezing.

Note that this is not used by on-access pruning.
---
 src/backend/access/heap/pruneheap.c  | 221 +++++++++++++++++++++++----
 src/backend/access/heap/vacuumlazy.c | 146 ++----------------
 src/include/access/heapam.h          |  24 +--
 3 files changed, 221 insertions(+), 170 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9e00fbf3cd1..e3f9967e26c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,6 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/visibilitymapdefs.h"
 #include "access/xloginsert.h"
@@ -257,7 +258,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, PRUNE_ON_ACCESS, 0, NULL,
+			heap_page_prune_and_freeze(relation, buffer,
+									   InvalidBuffer, false,
+									   PRUNE_ON_ACCESS, 0, NULL,
 									   vistest, &presult, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -423,16 +426,115 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+					   BlockNumber heap_blk,
+					   Buffer heap_buf,
+					   Buffer vmbuffer,
+					   bool blk_known_av,
+					   PruneState *prstate,
+					   uint8 *vmflags,
+					   bool *do_set_pd_vis)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+	bool		do_set_vm = false;
+
+	if (prstate->all_visible && !PageIsAllVisible(heap_page))
+		*do_set_pd_vis = true;
+
+	if ((prstate->all_visible && !blk_known_av) ||
+		(prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+		do_set_vm = true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that in vacuum the bit
+	 * got cleared after heap_vac_scan_next_block() was called, so we must
+	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || PageIsAllVisible(heap_page) || *do_set_pd_vis);
+
+	return do_set_vm;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
+ * vmbuffer is the buffer that must already contain contain the required block
+ * of the visibility map if we are to update it. blk_known_av is the
+ * visibility status of the heap block as of the last call to
+ * find_next_unskippable_block().
+ *
  * reason indicates why the pruning is performed.  It is included in the WAL
  * record for debugging and analysis purposes, but otherwise has no effect.
  *
@@ -443,15 +545,20 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  *   FREEZE indicates that we will also freeze tuples, and will return
  *   'all_visible', 'all_frozen' flags to the caller.
  *
- * If the HEAP_PRUNE_FREEZE option is set, we will freeze tuples if it's
+ *   UPDATE_VIS indicates that we will set the page's status in the VM.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
  * required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
  * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
+ *
  *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
@@ -478,6 +585,7 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+						   Buffer vmbuffer, bool blk_known_av,
 						   PruneReason reason,
 						   int options,
 						   const struct VacuumCutoffs *cutoffs,
@@ -496,10 +604,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	TransactionId frz_conflict_horizon = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = vistest;
@@ -828,19 +939,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	Assert(!prstate.all_frozen || prstate.all_visible);
 
 	/*
-	 * Though callers should set the VM if PD_ALL_VISIBLE is set here, it is
-	 * allowed for the page-level bit to be set and the VM to be clear.
+	 * Determine whether or not to set the page level PD_ALL_VISIBLE and the
+	 * visibility map bits based on information from the VM and from
+	 * all_visible and all_frozen variables.
+	 *
+	 * Though callers should set the VM if PD_ALL_VISIBLE is set, it is
+	 * allowed for the page-level bit to be set and the VM to be clear. We log
+	 * setting PD_ALL_VISIBLE on the heap page in a
+	 * XLOG_HEAP2_PRUNE_VACUUM_SCAN record and setting the VM bits in a later
+	 * emitted XLOG_HEAP2_VISIBLE record.
+	 *
 	 * Setting PD_ALL_VISIBLE when we are making the changes to the page that
 	 * render it all-visible allows us to omit the heap page from the WAL
 	 * chain when later updating the VM -- even when checksums/wal_log_hints
 	 * are enabled.
 	 */
 	do_set_pd_vis = false;
+	do_set_vm = false;
 	if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
-	{
-		if (prstate.all_visible && !PageIsAllVisible(page))
-			do_set_pd_vis = true;
-	}
+		do_set_vm = heap_page_will_set_vis(relation,
+										   blockno, buffer, vmbuffer, blk_known_av,
+										   &prstate, &new_vmbits, &do_set_pd_vis);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -928,28 +1047,72 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	END_CRIT_SECTION();
 
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * VACUUM will call heap_page_would_be_all_visible() during the second
+	 * pass over the heap to determine all_visible and all_frozen for the page
+	 * -- this is a specialized version of that logic. Now that we've finished
+	 * pruning and freezing, make sure that we're in total agreement with
+	 * heap_page_would_be_all_visible() using an assertion.
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
+
+		if (!heap_page_is_all_visible(relation, buffer,
+									  prstate.cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
+	}
+#endif
+
+	/* Now set the VM */
+	if (do_set_vm)
+	{
+		TransactionId vm_conflict_horizon;
+
+		Assert((new_vmbits & VISIBILITYMAP_VALID_BITS) != 0);
+
+		/*
+		 * The conflict horizon for that record must be the newest xmin on the
+		 * page.  However, if the page is completely frozen, there can be no
+		 * conflict and the vm_conflict_horizon should remain
+		 * InvalidTransactionId.  This includes the case that we just froze
+		 * all the tuples; the prune-freeze record included the conflict XID
+		 * already so a snapshotConflictHorizon sufficient to make everything
+		 * safe for REDO was logged when the page's tuples were frozen.
+		 */
+		if (prstate.all_frozen)
+			vm_conflict_horizon = InvalidTransactionId;
+		else
+			vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		old_vmbits = visibilitymap_set(relation, blockno,
+									   InvalidXLogRecPtr,
+									   vmbuffer, vm_conflict_horizon,
+									   new_vmbits);
+	}
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e8721761392..d71f3755dce 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   TransactionId OldestXmin,
 										   OffsetNumber *deadoffsets,
@@ -2015,7 +2010,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, PRUNE_VACUUM_SCAN, prune_options,
+	heap_page_prune_and_freeze(rel, buf,
+							   vmbuffer, all_visible_according_to_vm,
+							   PRUNE_VACUUM_SCAN, prune_options,
 							   &vacrel->cutoffs,
 							   vacrel->vistest,
 							   &presult,
@@ -2036,33 +2033,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2096,112 +2066,28 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
 	/*
-	 * Handle setting visibility map bits based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if ((presult.all_visible && !all_visible_according_to_vm) ||
-		(presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer)))
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		/*
-		 * If the page is all-frozen, we can pass InvalidTransactionId as our
-		 * cutoff_xid, since a snapshotConflictHorizon sufficient to make
-		 * everything safe for REDO was logged when the page's tuples were
-		 * frozen.
-		 */
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * Even if we are only setting the all-frozen bit, there is a small
-		 * chance that the VM was modified sometime between setting
-		 * all_visible_according_to_vm and checking the visibility during
-		 * pruning. Check the return value of old_vmbits to ensure the
-		 * visibility map counters used for logging are accurate.
-		 *
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-
-	/*
-	 * Now handle two potential corruption cases:
-	 *
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
-
 	return presult.ndeleted;
 }
 
@@ -3591,7 +3477,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Wrapper for heap_page_would_be_all_visible() which can be used for
  * callers that expect no LP_DEAD on the page.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId OldestXmin,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e97b53f1ee8..493ddeacbc0 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,14 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
+	uint8		new_vmbits;
+	uint8		old_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -369,6 +364,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+									   Buffer vmbuffer, bool blk_known_av,
 									   PruneReason reason,
 									   int options,
 									   const struct VacuumCutoffs *cutoffs,
@@ -397,6 +393,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v15-0020-Inline-TransactionIdFollows-Precedes.patch (5.0K, 20-v15-0020-Inline-TransactionIdFollows-Precedes.patch)
  download | inline diff:
From 2ae2b01dc50b7dde504519c0540ece7acf801211 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v15 20/23] Inline TransactionIdFollows/Precedes()

Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/transam/transam.c | 64 -------------------------
 src/include/access/transam.h         | 70 ++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
-	/*
-	 * If either ID is a permanent XID then we can just do unsigned
-	 * comparison.  If both are normal, do a modulo-2^32 comparison.
-	 */
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 < id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 <= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 > id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 >= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff >= 0);
-}
-
 
 /*
  * TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
 } TransamVariablesData;
 
 
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+	/*
+	 * If either ID is a permanent XID then we can just do unsigned
+	 * comparison.  If both are normal, do a modulo-2^32 comparison.
+	 */
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 < id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 <= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 > id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 >= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff >= 0);
+}
+
+
 /* ----------------
  *		extern declarations
  * ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
 extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
-- 
2.43.0



  [text/x-patch] v15-0021-Unset-all-visible-sooner-if-not-freezing.patch (2.5K, 21-v15-0021-Unset-all-visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 74ce20dc392912c2f066c8c32819ae206acfde7c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v15 21/23] Unset all-visible sooner if not freezing

In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.

Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.

Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
 src/backend/access/heap/pruneheap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 631394889d7..e64addfdf5d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1523,8 +1523,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1777,8 +1780,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v15-0019-Use-GlobalVisState-to-determine-page-level-visib.patch (10.7K, 22-v15-0019-Use-GlobalVisState-to-determine-page-level-visib.patch)
  download | inline diff:
From dbde84e7d5706b06ea252f75bc5aa7bc39ff2dea Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v15 19/23] Use GlobalVisState to determine page level
 visibility

During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.

It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.

Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.

Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
 src/backend/access/heap/heapam_visibility.c | 28 +++++++++++++
 src/backend/access/heap/pruneheap.c         | 46 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 20 ++++-----
 src/include/access/heapam.h                 |  4 +-
 4 files changed, 60 insertions(+), 38 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 52e956189e8..631394889d7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -134,10 +134,9 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon, when setting the VM or when
+	 * freezing all the live tuples on the page.
 	 *
 	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
 	 * convenient for heap_page_prune_and_freeze(), to use them to decide
@@ -706,14 +705,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon.
-	 * This is most likely to happen when updating the VM and/or freezing all
-	 * live tuples on the page. It is updated before returning to the caller
-	 * because vacuum does assert-build only validation on the page using this
-	 * field.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState.
+	 *
+	 * If we encounter an uncommitted tuple, this field is unmaintained. If
+	 * the page is being set all-visible or when freezing all live tuples on
+	 * the page, it is used to calculate the snapshot conflict horizon.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -909,6 +906,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update the hint
@@ -1130,7 +1137,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		Assert(prstate.cutoffs);
 
 		if (!heap_page_is_all_visible(relation, buffer,
-									  prstate.cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1656,19 +1663,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e59eb40133d..d9b83fb6115 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2734,7 +2734,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 InvalidOffsetNumber);
 
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3479,14 +3479,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3505,7 +3504,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * *all_frozen is an output parameter indicating to the caller if every tuple
  * on the page is frozen.
@@ -3526,7 +3525,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3598,8 +3597,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		tuple.t_len = ItemIdGetLength(itemid);
 		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
-										 buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+												  buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3618,8 +3617,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin,
-											   OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 394f62a21e5..34ee323a423 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -395,7 +395,7 @@ extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -407,6 +407,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v15-0022-Allow-on-access-pruning-to-set-pages-all-visible.patch (29.0K, 23-v15-0022-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 8df8cf1d9c5baa8d07e623e80dfaeb5ff4b25228 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v15 22/23] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c              | 15 +++-
 src/backend/access/heap/heapam_handler.c      | 15 +++-
 src/backend/access/heap/pruneheap.c           | 89 ++++++++++++++-----
 src/backend/access/index/indexam.c            | 46 ++++++++++
 src/backend/access/table/tableam.c            | 39 +++++++-
 src/backend/executor/execMain.c               |  4 +
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  7 +-
 src/backend/executor/nodeIndexscan.c          | 18 ++--
 src/backend/executor/nodeSeqscan.c            | 24 +++--
 src/include/access/genam.h                    | 11 +++
 src/include/access/heapam.h                   | 24 ++++-
 src/include/access/relscan.h                  |  6 ++
 src/include/access/tableam.h                  | 30 ++++++-
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  3 +-
 16 files changed, 292 insertions(+), 47 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14a2996b9ee..6181e355aaf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e64addfdf5d..0d8fea346c5 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -45,6 +45,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	const struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -185,9 +187,13 @@ static void page_verify_redirects(Page page);
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -251,6 +257,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			int			options = 0;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+			}
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -258,8 +271,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * that during on-access pruning with the current implementation.
 			 */
 			heap_page_prune_and_freeze(relation, buffer,
-									   InvalidBuffer, false,
-									   PRUNE_ON_ACCESS, 0, NULL,
+									   vmbuffer ? *vmbuffer : InvalidBuffer,
+									   false,	/* blk_known_av */
+									   PRUNE_ON_ACCESS, options, NULL,
 									   vistest, &presult, &dummy_off_loc, NULL, NULL);
 
 			/*
@@ -443,6 +457,8 @@ heap_page_will_set_vis(Relation relation,
 					   Buffer heap_buf,
 					   Buffer vmbuffer,
 					   bool blk_known_av,
+					   PruneReason reason,
+					   bool do_prune, bool do_freeze,
 					   PruneState *prstate,
 					   uint8 *vmflags,
 					   bool *do_set_pd_vis)
@@ -450,6 +466,32 @@ heap_page_will_set_vis(Relation relation,
 	Page		heap_page = BufferGetPage(heap_buf);
 	bool		do_set_vm = false;
 
+	*do_set_pd_vis = false;
+
+	if (!prstate->attempt_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+	{
+		prstate->all_visible = prstate->all_frozen = false;
+		return false;
+	}
+
 	if (prstate->all_visible && !PageIsAllVisible(heap_page))
 		*do_set_pd_vis = true;
 
@@ -473,6 +515,9 @@ heap_page_will_set_vis(Relation relation,
 	 * page-level bit is clear.  However, it's possible that in vacuum the bit
 	 * got cleared after heap_vac_scan_next_block() was called, so we must
 	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 *
+	 * XXX: This will never trigger for on-access pruning because it passes
+	 * blk_known_av as false. Should we remove that condition here?
 	 */
 	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
 			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -615,6 +660,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
 	prstate.cutoffs = cutoffs;
 
 	/*
@@ -692,7 +738,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
-	else if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
+	else if (prstate.attempt_update_vm)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = false;
@@ -906,6 +952,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * Even if we don't prune anything, if we found a new value for the
+	 * pd_prune_xid field or the page was marked full, we will update the hint
+	 * bit.
+	 */
+	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+		PageIsFull(page);
+
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
@@ -916,14 +970,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
-	/*
-	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
-	 */
-	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
-		PageIsFull(page);
-
 	/*
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
@@ -951,8 +997,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	if (prstate.lpdead_items > 0)
 		prstate.all_visible = prstate.all_frozen = false;
 
-	Assert(!prstate.all_frozen || prstate.all_visible);
-
 	/*
 	 * Determine whether or not to set the page level PD_ALL_VISIBLE and the
 	 * visibility map bits based on information from the VM and from
@@ -968,12 +1012,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * As such, it is possible to only update the VM when PD_ALL_VISIBLE is
 	 * already set.
 	 */
-	do_set_pd_vis = false;
-	do_set_vm = false;
-	if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
-		do_set_vm = heap_page_will_set_vis(relation,
-										   blockno, buffer, vmbuffer, blk_known_av,
-										   &prstate, &new_vmbits, &do_set_pd_vis);
+	do_set_vm = heap_page_will_set_vis(relation,
+									   blockno, buffer, vmbuffer, blk_known_av,
+									   reason, do_prune, do_freeze,
+									   &prstate, &new_vmbits, &do_set_pd_vis);
+
+	Assert(!prstate.all_frozen || prstate.all_visible);
 
 	/* Lock vmbuffer before entering a critical section */
 	if (do_set_vm)
@@ -1134,7 +1178,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		if (!heap_page_is_all_visible(relation, buffer,
 									  prstate.vistest,
@@ -2299,8 +2342,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
  * all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..8d582a8eafd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 831c55ce787..15be318fd41 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 0831c33b038..87827127d96 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -174,6 +174,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -200,6 +205,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 34ee323a423..9dcf8d29496 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -363,7 +380,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 									   Buffer vmbuffer, bool blk_known_av,
 									   PruneReason reason,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 77eb41eb6dc..6f5d4f9bb65 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -881,6 +883,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -918,10 +939,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, struct ScanKeyData *key)
+				   int nkeys, struct ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3a920cc7d17..c854be93436 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v15-0023-Set-pd_prune_xid-on-insert.patch (6.5K, 24-v15-0023-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 3d30243c60a34b3dfd63eff381e86626e0c466e7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v15 23/23] Set pd_prune_xid on insert

Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.

For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.

Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6181e355aaf..1704269715e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 502517fa62e..8c2a4a2e847 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -473,6 +473,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -622,9 +628,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-09-24 20:13  Robert Haas <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Robert Haas @ 2025-09-24 20:13 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

I find this patch set quite hard to follow. 0001 altogether removes
the use of XLOG_HEAP2_VISIBLE in cases where we use
XLOG_HEAP2_MULTI_INSERT, but then 0007 (the next non-refactoring
patch) begins half-removing the dependency on XLOG_HEAP2_VISIBLE,
assisted by 0009 and 0010, and then later you come back and remove the
other half of the dependency. I know it was I who proposed (off-list)
first making the XLOG_HEAP2_VISIBLE record only deal with the VM page
and not the heap buffer, but I'm not sure that idea quite worked out
in terms of making this easier to follow. At the least, it seems weird
that COPY FREEZE is an exception that gets handled in a different way
than all the other cases, fully removing the dependency in one step.
It would also be nice if each time you repost this, or maybe in a
README that you post along beside the actual patches, you'd include
some kind of roadmap to help the reader understand the internal
structure of the patch set, like 1 does this, 2-9 get us to here,
10-whatever get us to this next place.

I don't really understand how the interlocking works. 0011 changes
visibilitymap_set so that it doesn't take the heap block as an
argument, but we'd better hold a lock on the heap page while setting
the VM bit, otherwise I think somebody could come along and modify the
heap page after we decided it was all-visible and before we actually
set the VM bit, which would be terrible. I would expect the comments
and the commit message to say something about that, but I don't see
that they do.

I find myself fearful of the way that 0007 propagates the existing
hacks around setting the VM bit into a new place:

+               /*
+                * We always emit a WAL record when setting
PD_ALL_VISIBLE, but we are
+                * careful not to emit a full page image unless
+                * checksums/wal_log_hints are enabled. We only set
the heap page LSN
+                * if full page images were an option when emitting
WAL. Otherwise,
+                * subsequent modifications of the page may
incorrectly skip emitting
+                * a full page image.
+                */
+               if (do_prune || nplans > 0 ||
+                       (xlrec.flags & XLHP_SET_PD_ALL_VIS &&
XLogHintBitIsNeeded()))
+                       PageSetLSN(page, lsn);

I suppose it's not the worst thing to duplicate this logic, because
you're later going to remove the original copy. But, it took me >10
minutes to find the text in src/backend/access/transam/README, in the
second half of the "Writing Hints" section, that explains the overall
principle here, and since the patch set doesn't seem to touch that
text, maybe you weren't even aware it was there. And, it's a little
weird to have a single WAL record that is either a hint or not a hint
depending on a complex set of conditions. (IMHO mixing & and &&
without parentheses is quite brave, and an explicit != 0 might not be
a bad idea either.)

Anyway, I kind of wonder if it's time to back out the hack that I
installed here many years ago. At the time, I thought that it would be
bad if a VACUUM swept over the visibility map setting VM bits and as a
result emitted an FPI for every page in the entire heap ... but
everyone who is running with checksums has accepted that cost already,
and with those being the default, that's probably going to be most
people. It would be even more compelling if we were going to freeze,
prune, and set all-visible on access, because then presumably the case
where we touch a page and ONLY set the VM bit would be rare, so the
cost of doing that wouldn't matter much, but I guess the patch doesn't
go that far -- we can freeze or set all-visible on access but not
prune, without which the scenario I was worrying about at the time is
still fairly plausible, I think, if checksums are turned off.

-- 
Robert Haas
EDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-10-06 22:40  Melanie Plageman <[email protected]>
  parent: Robert Haas <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-10-06 22:40 UTC (permalink / raw)
  To: Robert Haas <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, Sep 24, 2025 at 4:13 PM Robert Haas <[email protected]> wrote:
>
> I find this patch set quite hard to follow. 0001 altogether removes
> the use of XLOG_HEAP2_VISIBLE in cases where we use
> XLOG_HEAP2_MULTI_INSERT, but then 0007 (the next non-refactoring
> patch) begins half-removing the dependency on XLOG_HEAP2_VISIBLE,
> assisted by 0009 and 0010, and then later you come back and remove the
> other half of the dependency. I know it was I who proposed (off-list)
> first making the XLOG_HEAP2_VISIBLE record only deal with the VM page
> and not the heap buffer, but I'm not sure that idea quite worked out
> in terms of making this easier to follow. At the least, it seems weird
> that COPY FREEZE is an exception that gets handled in a different way
> than all the other cases, fully removing the dependency in one step.
> It would also be nice if each time you repost this, or maybe in a
> README that you post along beside the actual patches, you'd include
> some kind of roadmap to help the reader understand the internal
> structure of the patch set, like 1 does this, 2-9 get us to here,
> 10-whatever get us to this next place.

In attached v16, I’ve reverted to removing XLOG_HEAP2_VISIBLE
entirely, rather than first removing each caller's heap page from the
VM WAL chain. I reordered changes and squashed several refactoring
patches to improve patch-by-patch readability. This should make the
set read differently from earlier versions that removed
XLOG_HEAP2_VISIBLE and had more step-by-step mechanical refactoring.

I think if we plan to go all the way with removing XLOG_HEAP2_VISIBLE,
having intermediate patches that just set PD_ALL_VISIBLE when making
other heap pages are more confusing than helpful. Also, I think having
separate flags for setting PD_ALL_VISIBLE in the WAL record
over-complicated the code.

0001:  remove XLOG_HEAP2_VISIBLE from COPY FREEZE
0002 - 0005: various refactoring in advance of removing
XLOG_HEAP2_VISIBLE in pruning
0006: Pruning and freezing by vacuum sets the VM and emits a single
WAL record with those changes
0007: Reaping (phase III) by vacuum sets the VM and sets line pointers
unused in a single WAL record
0008 - 0009: XLOG_HEAP2_VISIBLE is eliminated
0010 - 0012: preparation for setting VM on-access
0013: set VM on-access
0014: set pd_prune_xid on insert

> I find myself fearful of the way that 0007 propagates the existing
> hacks around setting the VM bit into a new place:
>
> +               /*
> +                * We always emit a WAL record when setting
> PD_ALL_VISIBLE, but we are
> +                * careful not to emit a full page image unless
> +                * checksums/wal_log_hints are enabled. We only set
> the heap page LSN
> +                * if full page images were an option when emitting
> WAL. Otherwise,
> +                * subsequent modifications of the page may
> incorrectly skip emitting
> +                * a full page image.
> +                */
> +               if (do_prune || nplans > 0 ||
> +                       (xlrec.flags & XLHP_SET_PD_ALL_VIS &&
> XLogHintBitIsNeeded()))
> +                       PageSetLSN(page, lsn);
>
> I suppose it's not the worst thing to duplicate this logic, because
> you're later going to remove the original copy. But, it took me >10
> minutes to find the text in src/backend/access/transam/README, in the
> second half of the "Writing Hints" section, that explains the overall
> principle here, and since the patch set doesn't seem to touch that
> text, maybe you weren't even aware it was there.

I don't think that src/backend/access/transam/README must change with
my patch. It is still true that if the only change we are making to
the heap page is setting PD_ALL_VISIBLE and checksums/wal_log_hints
are disabled, we explicitly avoid an FPI and thus can't stamp the page
LSN.

> And, it's a little
> weird to have a single WAL record that is either a hint or not a hint
> depending on a complex set of conditions.

PD_ALL_VISIBLE is different from tuple hints and other page hints
because setting the VM is always WAL logged and when we replay that,
it will always set PD_ALL_VISIBLE, so PD_ALL_VISIBLE is effectively
always WAL-logged. The other hints aren't wal-logged unless checksums
are enabled and we need an FPI. So PD_ALL_VISIBLE is different from
other page hints in multiple ways. We can't make it more like those
hints because of needing to preserve the invariant that the VM is
never set when the page is clear. The only thing we could do is forbid
omitting the FPI even when checksums are not enabled.

> Anyway, I kind of wonder if it's time to back out the hack that I
> installed here many years ago. At the time, I thought that it would be
> bad if a VACUUM swept over the visibility map setting VM bits and as a
> result emitted an FPI for every page in the entire heap ... but
> everyone who is running with checksums has accepted that cost already,
> and with those being the default, that's probably going to be most
> people.

I agree that PD_ALL_VISIBLE persistence is complicated, but we have
other special cases that complicate the code for a performance
benefit. I guess the question is if we are saying people shouldn't run
without checksums in production. If that's true, then it's fine to
remove this optimization. Otherwise, I'm not so sure.

I think cloud providers generally have checksums enabled, but I don't
know what is common on-prem.

> It would be even more compelling if we were going to freeze,
> prune, and set all-visible on access, because then presumably the case
> where we touch a page and ONLY set the VM bit would be rare, so the
> cost of doing that wouldn't matter much, but I guess the patch doesn't
> go that far -- we can freeze or set all-visible on access but not
> prune, without which the scenario I was worrying about at the time is
> still fairly plausible, I think, if checksums are turned off.

With the whole set applied, we can prune and set the VM on access but
not freeze. I have a patch to do that, but it introduced noticeable
CPU overhead to prepare the freeze plans. I'd have to spend much more
time studying it to avoid regressing workloads where we don't end up
freezing but prepare the freeze plans during SELECT queries.

- Melanie


Attachments:

  [text/x-patch] v16-0005-Make-heap_page_is_all_visible-independent-of-LVR.patch (5.9K, 2-v16-0005-Make-heap_page_is_all_visible-independent-of-LVR.patch)
  download | inline diff:
From 280948d3f1f18b8a6c473d6b56023b0c795f0efa Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 3 Oct 2025 15:57:02 -0400
Subject: [PATCH v16 05/14] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 44 +++++++++++++++-------------
 src/include/access/heapam.h          |  6 ++++
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8eef436dd10..aed1f8e1139 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,8 +463,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2014,8 +2012,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2917,8 +2916,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * emitted.
 	 */
 	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+								 &all_frozen,
+								 &visibility_cutoff_xid,
+								 &vacrel->offnum))
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
@@ -3608,15 +3609,20 @@ dead_items_cleanup(LVRelState *vacrel)
  * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
  * on this page is frozen.
  *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
  */
-static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3639,7 +3645,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3663,10 +3669,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
-										 buf))
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3685,8 +3690,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+					if (!TransactionIdPrecedes(xmin, OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3721,7 +3725,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bc71fef6643..ea67fb83fbe 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -432,6 +432,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v16-0004-Update-PruneState.all_-visible-frozen-earlier-in.patch (14.8K, 3-v16-0004-Update-PruneState.all_-visible-frozen-earlier-in.patch)
  download | inline diff:
From a5772e0eec65df1cf064055b1ba77a51861f7fe8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 16:25:44 -0400
Subject: [PATCH v16 04/14] Update PruneState.all_[visible|frozen] earlier in
 pruning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen when dead items are present. This allows opportunistic
freezing if the page would otherwise be fully frozen, since those dead
items are later removed in vacuum’s third phase.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags promptly avoids extra bookkeeping in
heap_prune_unchanged_lp_normal(). At present this has no runtime effect
because all callers that consider setting the VM also attempt freezing,
but future callers (e.g. on-access pruning) may want to set the VM
without preparing freeze plans.

We also used to defer clearing all_visible and all_frozen until after
computing the visibility cutoff XID. By determining the cutoff earlier,
we can update these flags immediately after deciding whether to
opportunistically freeze. This is necessary if we want to set the VM in
the same WAL record that prunes and freezes tuples on the page.

While we are at it, unset all_frozen whenever we unset all_visible.
Previously we could only use all_frozen in combination with all_visible
as all_frozen was not unset when not all-visible tuples were encountered.
It is best to keep them both up-to-date to avoid mistakes when using
all_frozen.
---
 src/backend/access/heap/pruneheap.c  | 145 ++++++++++++++-------------
 src/backend/access/heap/vacuumlazy.c |   9 +-
 2 files changed, 78 insertions(+), 76 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f819ab57d55..c23a6a21a7f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -137,15 +137,12 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
+	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
+	 * That's convenient for heap_page_prune_and_freeze(), to use them to
+	 * decide whether to freeze the page or not.  The all_visible and
+	 * all_frozen values returned to the caller are adjusted to include
+	 * LP_DEAD items after we determine whether or not to opportunistically
+	 * freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -308,7 +305,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * pre-freeze checks.
  *
  * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
- * been decided before calling this function.
+ * been decided before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to freeze
+ * tuples.
  *
  * prstate is an input/output parameter.
  *
@@ -320,7 +319,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 					  bool did_tuple_hint_fpi,
 					  bool do_prune,
 					  bool do_hint_prune,
-					  PruneState *prstate)
+					  PruneState *prstate,
+					  TransactionId *frz_conflict_horizon)
 {
 	bool		do_freeze = false;
 
@@ -357,8 +357,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * anymore.  The opportunistic freeze heuristic must be improved;
 		 * however, for now, try to approximate the old logic.
 		 */
-		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		if (prstate->all_frozen && prstate->nfrozen > 0)
 		{
+			Assert(prstate->all_visible);
+
 			/*
 			 * Freezing would make the page all-frozen.  Have already emitted
 			 * an FPI or will do so anyway?
@@ -388,6 +390,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * critical section.
 		 */
 		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+		/*
+		 * Calculate what the snapshot conflict horizon should be for a record
+		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+		 * for conflicts when the whole page is eligible to become all-frozen
+		 * in the VM once we're done with it.  Otherwise we generate a
+		 * conservative cutoff by stepping back from OldestXmin.
+		 */
+		if (prstate->all_frozen)
+			*frz_conflict_horizon = prstate->visibility_cutoff_xid;
+		else
+		{
+			/* Avoids false conflicts when hot_standby_feedback in use */
+			*frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+			TransactionIdRetreat(*frz_conflict_horizon);
+		}
 	}
 	else if (prstate->nfrozen > 0)
 	{
@@ -432,10 +450,11 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * considered advantageous for overall system performance to do so now.  The
  * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
  * arguments are required when freezing.  When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen on exit,
- * to indicate if the VM bits can be set.  They are always set to false when
- * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
- * callers that also freeze need that information.
+ * passed, we also set presult->all_visible and presult->all_frozen after
+ * determining whether or not to opporunistically freeze, to indicate if the
+ * VM bits can be set.  They are always set to false when the
+ * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
+ * that also freeze need that information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -471,6 +490,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_hint_prune;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId frz_conflict_horizon = InvalidTransactionId;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = params->vistest;
@@ -540,10 +560,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * are tuples present that are not visible to everyone or if there are
 	 * dead tuples which are not yet removable.  However, dead tuples which
 	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * opportunistically freezing.  Because of that, we do not immediately
+	 * clear all_visible when we see LP_DEAD items.  We fix that after
+	 * scanning the line pointers, before we return the value to the caller,
+	 * so that the caller doesn't set the VM bit incorrectly.
 	 */
 	if (prstate.attempt_freeze)
 	{
@@ -778,8 +798,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  did_tuple_hint_fpi,
 									  do_prune,
 									  do_hint_prune,
-									  &prstate);
+									  &prstate,
+									  &frz_conflict_horizon);
 
+	/*
+	 * While scanning the line pointers, we did not clear
+	 * all_visible/all_frozen when encountering LP_DEAD items because we
+	 * wanted the decision whether or not to freeze the page to be unaffected
+	 * by the short-term presence of LP_DEAD items.  These LP_DEAD items are
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that we finished determining whether or not to freeze the page,
+	 * update all_visible and all_frozen so that they reflect the true state
+	 * of the page for setting PD_ALL_VISIBLE and VM bits.
+	 */
+	if (prstate.lpdead_items > 0)
+		prstate.all_visible = prstate.all_frozen = false;
+
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -838,27 +876,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
 			TransactionId conflict_xid;
 
-			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
-			 */
-			if (do_freeze)
-			{
-				if (prstate.all_visible && prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
-			}
-
 			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
 				conflict_xid = frz_conflict_horizon;
 			else
@@ -882,30 +901,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
+	presult->all_visible = prstate.all_visible;
+	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
 	/*
@@ -1285,8 +1282,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1412,7 +1412,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1434,7 +1434,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1447,7 +1447,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1466,7 +1466,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1484,7 +1484,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
@@ -1552,8 +1552,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6125f157709..8eef436dd10 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2007,7 +2007,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * agreement with heap_page_is_all_visible() using an assertion.
 	 */
 #ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
 	if (presult.all_visible)
 	{
 		TransactionId debug_cutoff;
@@ -2060,6 +2059,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
 	Assert(!presult.all_visible || !(*has_lpdead_items));
+	Assert(!presult.all_frozen || presult.all_visible);
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
@@ -2165,11 +2165,10 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
+	 * it as all-frozen.
 	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	else if (all_visible_according_to_vm && presult.all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		uint8		old_vmbits;
 
-- 
2.43.0



  [text/x-patch] v16-0002-Assorted-trivial-heap_page_prune_and_freeze-clea.patch (15.6K, 4-v16-0002-Assorted-trivial-heap_page_prune_and_freeze-clea.patch)
  download | inline diff:
From 33a35d23ae88d634cb01024295099e5d5466b1a3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 12:06:19 -0400
Subject: [PATCH v16 02/14] Assorted trivial heap_page_prune_and_freeze cleanup

Group heap_page_prune_and_freeze() input parameters in a struct and
clean up their documentation.

Rename a member of PruneState and disambiguate some local
heap_page_prune_and_freeze() variables.
---
 src/backend/access/heap/pruneheap.c  | 114 +++++++++++++--------------
 src/backend/access/heap/vacuumlazy.c |  16 ++--
 src/include/access/heapam.h          |  62 ++++++++++++---
 src/tools/pgindent/typedefs.list     |   1 +
 4 files changed, 115 insertions(+), 78 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d8ea0c78f77..9ba89b1fc28 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -42,8 +42,8 @@ typedef struct
 	/* whether or not dead items can be set LP_UNUSED during pruning */
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
-	bool		freeze;
-	struct VacuumCutoffs *cutoffs;
+	bool		attempt_freeze;
+	const struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
 	 * Fields describing what to do to the page
@@ -253,15 +253,23 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
 		{
 			OffsetNumber dummy_off_loc;
+			PruneFreezeParams params;
 			PruneFreezeResult presult;
 
+			params.relation = relation;
+			params.buffer = buffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
+
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+			params.options = 0;
+
+			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -303,60 +311,43 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
- * required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
- *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
  *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
- * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
- * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
+ * it's required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now.  The
+ * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
+ * arguments are required when freezing.  When HEAP_PRUNE_FREEZE option is
+ * passed, we also set presult->all_visible and presult->all_frozen on exit,
+ * to indicate if the VM bits can be set.  They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
+ * callers that also freeze need that information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
  * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set.  On entry, they contain the oldest XID and
- * multi-XID seen on the relation so far.  They will be updated with oldest
- * values present on the page after pruning.  After processing the whole
- * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
- * for the relation.
+ * HEAP_PRUNE_FREEZE option is set in params.  On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far.  They will be updated
+ * with oldest values present on the page after pruning.  After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
  */
 void
-heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-						   GlobalVisState *vistest,
-						   int options,
-						   struct VacuumCutoffs *cutoffs,
+heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   PruneFreezeResult *presult,
-						   PruneReason reason,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
 {
+	Buffer		buffer = params->buffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	OffsetNumber offnum,
@@ -365,15 +356,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	HeapTupleData tup;
 	bool		do_freeze;
 	bool		do_prune;
-	bool		do_hint;
-	bool		hint_bit_fpi;
+	bool		do_hint_prune;
+	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
 	/* Copy parameters to prstate */
-	prstate.vistest = vistest;
-	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
-	prstate.cutoffs = cutoffs;
+	prstate.vistest = params->vistest;
+	prstate.mark_unused_now =
+		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.cutoffs = params->cutoffs;
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -394,7 +386,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/* initialize page freezing working state */
 	prstate.pagefrz.freeze_required = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
 		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -441,7 +433,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * function, when we return the value to the caller, so that the caller
 	 * doesn't set the VM bit incorrectly.
 	 */
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -467,7 +459,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
 	maxoff = PageGetMaxOffsetNumber(page);
-	tup.t_tableOid = RelationGetRelid(relation);
+	tup.t_tableOid = RelationGetRelid(params->relation);
 
 	/*
 	 * Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -555,7 +547,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
 	 * an FPI to be emitted.
 	 */
-	hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	/*
 	 * Process HOT chains.
@@ -663,7 +655,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * pd_prune_xid field or the page was marked full, we will update the hint
 	 * bit.
 	 */
-	do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
 	/*
@@ -671,7 +663,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * plans we prepared, or not.
 	 */
 	do_freeze = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (prstate.pagefrz.freeze_required)
 		{
@@ -702,16 +694,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				 * Freezing would make the page all-frozen.  Have already
 				 * emitted an FPI or will do so anyway?
 				 */
-				if (RelationNeedsWAL(relation))
+				if (RelationNeedsWAL(params->relation))
 				{
-					if (hint_bit_fpi)
+					if (did_tuple_hint_fpi)
 						do_freeze = true;
 					else if (do_prune)
 					{
 						if (XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
 					}
-					else if (do_hint)
+					else if (do_hint_prune)
 					{
 						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
@@ -753,7 +745,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
-	if (do_hint)
+	if (do_hint_prune)
 	{
 		/*
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
@@ -796,7 +788,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		/*
 		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
 		 */
-		if (RelationNeedsWAL(relation))
+		if (RelationNeedsWAL(params->relation))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -834,9 +826,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			else
 				conflict_xid = prstate.latest_xid_removed;
 
-			log_heap_prune_and_freeze(relation, buffer,
+			log_heap_prune_and_freeze(params->relation, buffer,
 									  conflict_xid,
-									  true, reason,
+									  true, params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -894,7 +886,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
 		{
@@ -1476,7 +1468,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 	}
 
 	/* Consider freezing any normal tuples which will not be removed */
-	if (prstate->freeze)
+	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ab6938d1da1..6125f157709 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1951,10 +1951,16 @@ lazy_scan_prune(LVRelState *vacrel,
 {
 	Relation	rel = vacrel->rel;
 	PruneFreezeResult presult;
-	int			prune_options = 0;
+	PruneFreezeParams params;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
+	params.relation = rel;
+	params.buffer = buf;
+	params.reason = PRUNE_VACUUM_SCAN;
+	params.cutoffs = &vacrel->cutoffs;
+	params.vistest = vacrel->vistest;
+
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
 	 *
@@ -1970,12 +1976,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	params.options = HEAP_PAGE_PRUNE_FREEZE;
 	if (vacrel->nindexes == 0)
-		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
+		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+	heap_page_prune_and_freeze(&params,
+							   &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e60d34dad25..bc71fef6643 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -221,6 +221,55 @@ typedef struct HeapPageFreeze
 
 } HeapPageFreeze;
 
+
+/* 'reason' codes for heap_page_prune_and_freeze() */
+typedef enum
+{
+	PRUNE_ON_ACCESS,			/* on-access pruning */
+	PRUNE_VACUUM_SCAN,			/* VACUUM 1st heap pass */
+	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
+} PruneReason;
+
+/*
+ * Input parameters to heap_page_prune_and_freeze()
+ */
+typedef struct PruneFreezeParams
+{
+	Relation	relation;		/* relation containing buffer to be pruned */
+	Buffer		buffer;			/* buffer to be pruned */
+
+	/*
+	 * The reason pruning was performed.  It is used to set the WAL record
+	 * opcode which is used for debugging and analysis purposes.
+	 */
+	PruneReason reason;
+
+	/*
+	 * Contains flag bits:
+	 *
+	 * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+	 * pruning.
+	 *
+	 * FREEZE indicates that we will also freeze tuples, and will return
+	 * 'all_visible', 'all_frozen' flags to the caller.
+	 */
+	int			options;
+
+	/*
+	 * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+	 * (see heap_prune_satisfies_vacuum).
+	 */
+	GlobalVisState *vistest;
+
+	/*
+	 * cutoffs contains the freeze cutoffs, established by VACUUM at the
+	 * beginning of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE
+	 * option is set. cutoffs->OldestXmin is also used to determine if dead
+	 * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+	 */
+	struct VacuumCutoffs *cutoffs;
+} PruneFreezeParams;
+
 /*
  * Per-page state returned by heap_page_prune_and_freeze()
  */
@@ -264,13 +313,6 @@ typedef struct PruneFreezeResult
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 } PruneFreezeResult;
 
-/* 'reason' codes for heap_page_prune_and_freeze() */
-typedef enum
-{
-	PRUNE_ON_ACCESS,			/* on-access pruning */
-	PRUNE_VACUUM_SCAN,			/* VACUUM 1st heap pass */
-	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
-} PruneReason;
 
 /* ----------------
  *		function prototypes for heap access method
@@ -367,12 +409,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-									   GlobalVisState *vistest,
-									   int options,
-									   struct VacuumCutoffs *cutoffs,
+extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
-									   PruneReason reason,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37f26f6c6b7..8a626d633d5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2340,6 +2340,7 @@ ProjectionPath
 PromptInterruptContext
 ProtocolVersion
 PrsStorage
+PruneFreezeParams
 PruneFreezeResult
 PruneReason
 PruneState
-- 
2.43.0



  [text/x-patch] v16-0003-Add-helper-for-freeze-determination-to-heap_page.patch (7.0K, 5-v16-0003-Add-helper-for-freeze-determination-to-heap_page.patch)
  download | inline diff:
From f269cdce51b10d0b5ccc0e047ff08b247e6adf89 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 14:22:10 -0400
Subject: [PATCH v16 03/14] Add helper for freeze determination to
 heap_page_prune_and_freeze

After scanning through the line pointers on the heap page during
vacuum's first phase, we use several statuses and information we
collected to determine whether or not we will use the freeze plans we
assembled.

Do this in a helper for better readability.
---
 src/backend/access/heap/pruneheap.c | 196 +++++++++++++++++-----------
 1 file changed, 117 insertions(+), 79 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9ba89b1fc28..f819ab57d55 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -301,6 +301,118 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans
+ * we prepared for the given heap buffer or not. If the caller specified we
+ * should not freeze tuples, it exits early. Otherwise, it does a few
+ * pre-freeze checks.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool did_tuple_hint_fpi,
+					  bool do_prune,
+					  bool do_hint_prune,
+					  PruneState *prstate)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and exit.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_prune)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	return do_freeze;
+}
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -662,85 +774,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
 	 */
-	do_freeze = false;
-	if (prstate.attempt_freeze)
-	{
-		if (prstate.pagefrz.freeze_required)
-		{
-			/*
-			 * heap_prepare_freeze_tuple indicated that at least one XID/MXID
-			 * from before FreezeLimit/MultiXactCutoff is present.  Must
-			 * freeze to advance relfrozenxid/relminmxid.
-			 */
-			do_freeze = true;
-		}
-		else
-		{
-			/*
-			 * Opportunistically freeze the page if we are generating an FPI
-			 * anyway and if doing so means that we can set the page
-			 * all-frozen afterwards (might not happen until VACUUM's final
-			 * heap pass).
-			 *
-			 * XXX: Previously, we knew if pruning emitted an FPI by checking
-			 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze
-			 * and prune records were combined, this heuristic couldn't be
-			 * used anymore.  The opportunistic freeze heuristic must be
-			 * improved; however, for now, try to approximate the old logic.
-			 */
-			if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
-			{
-				/*
-				 * Freezing would make the page all-frozen.  Have already
-				 * emitted an FPI or will do so anyway?
-				 */
-				if (RelationNeedsWAL(params->relation))
-				{
-					if (did_tuple_hint_fpi)
-						do_freeze = true;
-					else if (do_prune)
-					{
-						if (XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-					else if (do_hint_prune)
-					{
-						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
-	}
-	else if (prstate.nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate.pagefrz.freeze_required);
-
-		prstate.all_frozen = false;
-		prstate.nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
+	do_freeze = heap_page_will_freeze(params->relation, buffer,
+									  did_tuple_hint_fpi,
+									  do_prune,
+									  do_hint_prune,
+									  &prstate);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
-- 
2.43.0



  [text/x-patch] v16-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patch (12.1K, 6-v16-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patch)
  download | inline diff:
From 4312376fff987b32d4599ccd78893c8c2f7770e0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v16 01/14] Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE

Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting
bits in the VM, specify the changes to make to the VM block in the
XLOG_HEAP2_MULTI_INSERT record.

This halves the number of WAL records emitted by COPY FREEZE.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c        | 44 ++++++++++------
 src/backend/access/heap/heapam_xlog.c   | 52 ++++++++++++++++++-
 src/backend/access/heap/visibilitymap.c | 68 ++++++++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c  |  5 ++
 src/include/access/visibilitymap.h      |  3 ++
 5 files changed, 154 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ed0c0c2dc9f..7f354caec31 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
 
 		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+		{
 			all_frozen_set = true;
+			/* Lock the vmbuffer before entering the critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		}
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
@@ -2506,7 +2510,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		 * going to add further frozen rows to it.
 		 *
 		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
+		 * page, mark it as all-frozen and update the visibility map. We're
+		 * already holding a pin on the vmbuffer.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2517,7 +2522,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
 		else if (all_frozen_set)
+		{
 			PageSetAllVisible(page);
+			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 RelationGetRelationName(relation));
+		}
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2637,26 +2658,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+			{
+				Assert(BufferIsDirty(vmbuffer));
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
+			}
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
-		 */
 		if (all_frozen_set)
-		{
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..c2c7e6ab086 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -662,6 +663,55 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Read and update the visibility map (VM) block.
+	 *
+	 * We must always redo VM changes, even if the corresponding heap page
+	 * update was skipped due to the LSN interlock. Each VM block covers
+	 * multiple heap pages, so later WAL records may update other bits in the
+	 * same block. If this record includes a full-page image (FPI), subsequent
+	 * WAL records may depend on it to guard against torn pages.
+	 *
+	 * Heap page changes are replayed first to preserve the invariant:
+	 * PD_ALL_VISIBLE must be set on the heap page if the VM bit is set.
+	 *
+	 * Note that we released the heap page lock above. Under normal operation,
+	 * this would be unsafe — a concurrent modification could clear
+	 * PD_ALL_VISIBLE while the VM bit remained set, violating the invariant.
+	 *
+	 * During recovery, however, no concurrent writers exist. Therefore,
+	 * updating the VM without holding the heap page lock is safe enough. This
+	 * same approach is taken when replaying xl_heap_visible records (see
+	 * heap_xlog_visible()).
+	 */
+	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		char	   *relname;
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		/* We don't have relation name during recovery, so use relfilenode */
+		relname = psprintf("%u", rlocator.relNumber);
+		visibilitymap_set_vmbits(blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN,
+								 relname);
+
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		pfree(relname);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..738105eb97e 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
+ *		visibilitymap_set_vmbits - set bit(s) in a pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -321,6 +322,71 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set visibility map (VM) flags in the block referenced by vmBuf.
+ *
+ * This function is intended for callers that log VM changes together
+ * with the heap page modifications that rendered the page all-visible.
+ * Callers that log VM changes separately should use visibilitymap_set().
+ *
+ * Caller responsibilities:
+ * - vmBuf must be pinned and exclusively locked, and it must cover the
+ *   VM bits corresponding to heapBlk.
+ * - In normal operation (not recovery), this must be called inside a
+ *   critical section that also applies the necessary heap page changes
+ *   and, if applicable, emits WAL.
+ * - The caller is responsible for WAL logging the VM buffer changes and
+ *   for any required modifications to the associated heap page. This
+ *   includes preserving invariants such as holding a pin and exclusive
+ *   lock on the buffer containing heapBlk.
+ *
+ * heapRelname is used only for debugging.
+ */
+uint8
+visibilitymap_set_vmbits(BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags,
+						 const char *heapRelname)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+		 flags, heapRelname, heapBlk);
+#endif
+
+	/* Call in same critical section where WAL is emitted. */
+	Assert(InRecovery || CritSectionCount > 0);
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	Assert(BufferIsExclusiveLocked(vmBuf));
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..3dcf37ba03f 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,9 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags,
+									  const char *heapRelname);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v16-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-prune-f.patch (50.6K, 7-v16-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-prune-f.patch)
  download | inline diff:
From 0141c10d30bd7ea620d16d24201ba22e5337a4dc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:52:08 -0400
Subject: [PATCH v16 06/14] Eliminate XLOG_HEAP2_VISIBLE from vacuum
 prune/freeze
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum’s prune/freeze work, not to pruning
performed during normal page access.
---
 src/backend/access/heap/heapam_xlog.c  | 158 +++++++--
 src/backend/access/heap/pruneheap.c    | 474 ++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c   | 202 +----------
 src/backend/access/rmgrdesc/heapdesc.c |  11 +-
 src/include/access/heapam.h            |  36 +-
 src/include/access/heapam_xlog.h       |  17 +-
 6 files changed, 584 insertions(+), 314 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index c2c7e6ab086..911416bbc56 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
+	Size		freespace = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,22 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
+	if (xlrec.flags & XLHP_VM_ALL_VISIBLE)
+	{
+		vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (xlrec.flags & XLHP_VM_ALL_FROZEN)
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+	}
+
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. That means we
+	 * can't replay this record if it removes tuples that are still visible to
+	 * transactions on the standby, freeze tuples with xids that are still
+	 * considered running on the standby, or set a page as all-visible in the
+	 * VM if it isn't all-visible to all transactions on the standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	}
 
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -90,6 +103,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
+		bool		do_prune;
+		bool		mark_buffer_dirty = false;
+		bool		set_lsn = false;
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
 											   &nplans, &plans, &frz_offsets,
@@ -97,11 +113,16 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,36 +159,121 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		if (do_prune || nplans > 0)
+			mark_buffer_dirty = set_lsn = true;
+
+		/*
+		 * The critical integrity requirement here is that we must never end
+		 * up with with the visibility map bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear.  If that were to occur, a subsequent page
+		 * modification would fail to clear the visibility map bit.
+		 *
+		 * If this record only sets the VM, no need to dirty the heap page.
+		 */
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			mark_buffer_dirty = true;
+
+			/*
+			 * Always emit a WAL record when setting PD_ALL_VISIBLE but only
+			 * emit an FPI if checksums/wal_log_hints are enabled. Advance the
+			 * page LSN only if the record could include an FPI, since
+			 * recovery skips records <= the stamped LSN. Otherwise it might
+			 * skip an earlier FPI needed to repair a torn page.
+			 */
+			if (XLogHintBitIsNeeded())
+				set_lsn = true;
+		}
+
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+
+		if (set_lsn)
+			PageSetLSN(page, lsn);
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
-
-		PageSetLSN(page, lsn);
-		MarkBufferDirty(buffer);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or set PD_ALL_VISIBLE or the
+	 * VM, update the freespace map.
+	 *
+	 * Even when no actual space is freed (e.g., when only marking the page
+	 * all-visible or frozen), we still update the FSM. Because the FSM is
+	 * unlogged and maintained heuristically, it often becomes stale on
+	 * standbys. If such a standby is later promoted and runs VACUUM, it will
+	 * skip recalculating free space for pages that were marked all-visible
+	 * (or all-frozen, depending on the mode). FreeSpaceMapVacuum can then
+	 * propagate overly optimistic free space values upward, causing future
+	 * insertions to select pages that turn out to be unusable. In bulk, this
+	 * can lead to long stalls.
+	 *
+	 * To prevent this, always refresh the FSM’s view when a page becomes
+	 * all-visible or all-frozen.
+	 *
+	 * Do this regardless of whether a full-page image is logged, since FSM
+	 * data is not part of the page itself.
 	 *
-	 * Do this regardless of a full-page image being applied, since the FSM
-	 * data is not in the page anyway.
 	 */
 	if (BufferIsValid(buffer))
 	{
-		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
-						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
-		{
-			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+		if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+							XLHP_HAS_DEAD_ITEMS |
+							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+			(vmflags & VISIBILITYMAP_VALID_BITS))
+			freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
 
-			UnlockReleaseBuffer(buffer);
+		/*
+		 * We want to avoid holding an exclusive lock on the heap buffer while
+		 * doing IO (either of the FSM or the VM), so we'll release the lock
+		 * on the heap buffer before doing either.
+		 */
+		UnlockReleaseBuffer(buffer);
+	}
 
-			XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+	/*
+	 * Now read and update the VM block.
+	 *
+	 * We must redo changes to the VM even if the heap page was skipped due to
+	 * LSN interlock. See comment in heap_xlog_multi_insert() for more details
+	 * on replaying changes to the VM.
+	 */
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		char	   *relname;
+		uint8		old_vmbits = 0;
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		/* We don't have relation name during recovery, so use relfilenode */
+		relname = psprintf("%u", rlocator.relNumber);
+		old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
+
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+		{
+			Assert(BufferIsDirty(vmbuffer));
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		}
-		else
-			UnlockReleaseBuffer(buffer);
+		pfree(relname);
 	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
+	if (freespace > 0)
+		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c23a6a21a7f..f384d74416a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,6 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -43,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	const struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -132,17 +135,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze(), to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether or not to opportunistically
-	 * freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -173,6 +176,19 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid, bool blk_already_av,
+									  bool set_blk_all_frozen);
+
+static bool heap_page_will_set_vis(Relation relation,
+								   BlockNumber heap_blk,
+								   Buffer heap_buf,
+								   Buffer vmbuffer,
+								   bool blk_known_av,
+								   const PruneState *prstate,
+								   uint8 *vmflags,
+								   bool *do_set_pd_vis);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -258,6 +274,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			params.reason = PRUNE_ON_ACCESS;
 			params.vistest = vistest;
 			params.cutoffs = NULL;
+			params.vmbuffer = InvalidBuffer;
+			params.blk_known_av = false;
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -431,10 +449,108 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneState and blk_known_av. Some callers may already
+ * have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
+ * should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+					   BlockNumber heap_blk,
+					   Buffer heap_buf,
+					   Buffer vmbuffer,
+					   bool blk_known_av,
+					   const PruneState *prstate,
+					   uint8 *vmflags,
+					   bool *do_set_pd_vis)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+	bool		do_set_vm = false;
+
+	*do_set_pd_vis = false;
+
+	if (!prstate->attempt_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	if (prstate->all_visible && !PageIsAllVisible(heap_page))
+		*do_set_pd_vis = true;
+
+	if ((prstate->all_visible && !blk_known_av) ||
+		(prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+		do_set_vm = true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that in vacuum the bit
+	 * got cleared after heap_vac_scan_next_block() was called, so we must
+	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return do_set_vm;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -449,12 +565,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * it's required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
  * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
- * arguments are required when freezing.  When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen after
- * determining whether or not to opporunistically freeze, to indicate if the
- * VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
- * that also freeze need that information.
+ * arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -479,6 +596,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	OffsetNumber offnum,
@@ -488,15 +606,22 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
+	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	TransactionId frz_conflict_horizon = InvalidTransactionId;
+	TransactionId conflict_xid = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = params->vistest;
 	prstate.mark_unused_now =
 		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
 	prstate.cutoffs = params->cutoffs;
 
 	/*
@@ -543,50 +668,54 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
+	 *
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible when we see LP_DEAD items.  We fix that after
-	 * scanning the line pointers, before we return the value to the caller,
-	 * so that the caller doesn't set the VM bit incorrectly.
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * when we encounter LP_DEAD items. Instead, we correct all_visible after
+	 * deciding whether to freeze, but before updating the VM, to avoid
+	 * setting the VM bit incorrectly.
+	 *
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if (prstate.attempt_update_vm)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -818,6 +947,35 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+	 * based on information from the VM and the all_visible/all_frozen flags.
+	 *
+	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+	 * VM bit is clear, we strongly prefer to keep them in sync.
+	 *
+	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+	 * already been set. Setting only the VM is most common when setting an
+	 * already all-visible page all-frozen.
+	 */
+	do_set_vm = heap_page_will_set_vis(params->relation,
+									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   &prstate, &new_vmbits, &do_set_pd_vis);
+
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									prstate.latest_xid_removed, frz_conflict_horizon,
+									prstate.visibility_cutoff_xid, params->blk_known_av,
+									(do_set_vm && (new_vmbits & VISIBILITYMAP_ALL_FROZEN)));
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -838,14 +996,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_pd_vis)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -859,64 +1020,91 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (do_set_pd_vis)
+			PageSetAllVisible(page);
 
-		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
-		 */
-		if (RelationNeedsWAL(params->relation))
+		if (do_prune || do_freeze || do_set_pd_vis)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
+			Assert(PageIsAllVisible(page));
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
+			old_vmbits = visibilitymap_set_vmbits(blockno,
+												  vmbuffer, new_vmbits,
+												  RelationGetRelationName(params->relation));
+			if (old_vmbits == new_vmbits)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				/* Unset so we don't emit WAL since no change occurred */
+				do_set_vm = false;
+			}
+		}
 
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did If we were
+		 * only updating the VM and it turns out it was already set, we will
+		 * have unset do_set_vm earlier. As such, check it again before
+		 * emitting the record.
+		 */
+		if (RelationNeedsWAL(params->relation) &&
+			(do_prune || do_freeze || do_set_vm))
 			log_heap_prune_and_freeze(params->relation, buffer,
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  do_set_pd_vis,
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
 									  prstate.nowunused, prstate.nunused);
-		}
 	}
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
+
+		if (!heap_page_is_all_visible(params->relation, buffer,
+									  prstate.cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -2058,6 +2246,64 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 	return nplans;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid, bool blk_already_av,
+				 bool set_blk_all_frozen)
+{
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	TransactionId conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune && !do_freeze &&
+		do_set_vm && blk_already_av && set_blk_all_frozen)
+		conflict_xid = InvalidTransactionId;
+
+	return conflict_xid;
+}
+
 /*
  * Write an XLOG_HEAP2_PRUNE* WAL record
  *
@@ -2078,14 +2324,24 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer, uint8 vmflags,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
+						  bool set_pd_all_vis,
 						  PruneReason reason,
 						  HeapTupleFreeze *frozen, int nfrozen,
 						  OffsetNumber *redirected, int nredirected,
@@ -2095,6 +2351,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2103,8 +2360,23 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
 	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 * Note that if we explicitly skip an FPI, we must not set the heap page
+	 * LSN later.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2112,7 +2384,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2169,6 +2445,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Prepare the main xl_heap_prune record.  We already set the XLHP_HAS_*
 	 * flag above.
 	 */
+	if (vmflags & VISIBILITYMAP_ALL_VISIBLE)
+	{
+		xlrec.flags |= XLHP_VM_ALL_VISIBLE;
+		if (vmflags & VISIBILITYMAP_ALL_FROZEN)
+			xlrec.flags |= XLHP_VM_ALL_FROZEN;
+	}
 	if (RelationIsAccessibleInLogicalDecoding(relation))
 		xlrec.flags |= XLHP_IS_CATALOG_REL;
 	if (TransactionIdIsValid(conflict_xid))
@@ -2201,5 +2483,23 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+	{
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+	}
+
+	/*
+	 * We must bump the page LSN if pruning or freezing. If we are only
+	 * updating PD_ALL_VISIBLE, though, we can skip doing this unless
+	 * wal_log_hints/checksums are enabled. Torn pages are possible if we
+	 * update PD_ALL_VISIBLE without bumping the LSN, but this is deemed okay
+	 * for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+	{
+		Assert(BufferIsDirty(buffer));
+		PageSetLSN(BufferGetPage(buffer), recptr);
+	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index aed1f8e1139..39526bf608f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1958,6 +1958,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	params.reason = PRUNE_VACUUM_SCAN;
 	params.cutoffs = &vacrel->cutoffs;
 	params.vistest = vacrel->vistest;
+	params.vmbuffer = vmbuffer;
+	params.blk_known_av = all_visible_according_to_vm;
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1974,7 +1976,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	params.options = HEAP_PAGE_PRUNE_FREEZE;
+	params.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS;
 	if (vacrel->nindexes == 0)
 		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
@@ -1997,33 +1999,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2057,168 +2032,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
 	return presult.ndeleted;
@@ -2892,8 +2725,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  InvalidBuffer,	/* vmbuffer */
+								  0,	/* vmflags */
+								  InvalidTransactionId, /* conflict_xid */
 								  false,	/* no cleanup lock required */
+								  false,	/* set_pd_all_vis */
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..1cb44ca32d3 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
  * code, the latter of which is used in frontend (pg_waldump) code.
  */
 void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 									   int *nplans, xlhp_freeze_plan **plans,
 									   OffsetNumber **frz_offsets,
 									   int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & XLHP_VM_ALL_VISIBLE)
+		{
+			uint8		vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			if (xlrec->flags & XLHP_VM_ALL_FROZEN)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+		}
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ea67fb83fbe..2de39ba0cd1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 *
+	 * vmbuffer is the buffer that must already contain contain the required
+	 * block of the visibility map if we are to update it. blk_known_av is the
+	 * visibility status of the heap block as of the last call to
+	 * find_next_unskippable_block().
+	 */
+	Buffer		vmbuffer;
+	bool		blk_known_av;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -250,8 +261,9 @@ typedef struct PruneFreezeParams
 	 * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
 	 * pruning.
 	 *
-	 * FREEZE indicates that we will also freeze tuples, and will return
-	 * 'all_visible', 'all_frozen' flags to the caller.
+	 * FREEZE indicates that we will also freeze tuples
+	 *
+	 * UPDATE_VIS indicates that we will set the page's status in the VM.
 	 */
 	int			options;
 
@@ -284,19 +296,15 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+	 * we have attempted to update the VM.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
+	uint8		new_vmbits;
+	uint8		old_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -420,8 +428,10 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer, uint8 vmflags,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
+									  bool set_pd_all_vis,
 									  PruneReason reason,
 									  HeapTupleFreeze *frozen, int nfrozen,
 									  OffsetNumber *redirected, int nredirected,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..16c2b2e3c9c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
  * Main data section:
  *
  *	xl_heap_prune
- *		uint8				flags
+ *		uint16				flags
  *	TransactionId			snapshot_conflict_horizon
  *
  * Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
  */
 typedef struct xl_heap_prune
 {
-	uint8		flags;
+	uint16		flags;
 
 	/*
 	 * If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,7 +292,7 @@ typedef struct xl_heap_prune
 	 */
 } xl_heap_prune;
 
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
 /* to handle recovery conflict during logical decoding on standby */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
@@ -330,6 +330,15 @@ typedef struct xl_heap_prune
 #define		XLHP_HAS_DEAD_ITEMS	        (1 << 6)
 #define		XLHP_HAS_NOW_UNUSED_ITEMS   (1 << 7)
 
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set.
+ * xl_heap_prune should always use the XLHP_VM_ALL_VISIBLE and
+ * XLHP_VM_ALL_FROZEN flags and translate them to their visibilitymapdefs.h
+ * equivalents, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN.
+ */
+#define		XLHP_VM_ALL_VISIBLE			(1 << 8)
+#define		XLHP_VM_ALL_FROZEN			(1 << 9)
+
 /*
  * xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
  * (appears in xl_heap_prune's xlhp_freeze_plans sub-record)
@@ -497,7 +506,7 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
 								   uint8 vmflags);
 
 /* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
 												   OffsetNumber **frz_offsets,
 												   int *nredirected, OffsetNumber **redirected,
-- 
2.43.0



  [text/x-patch] v16-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (10.2K, 8-v16-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 95d94ee991ea163b4b7861a193b3a1a3497de73e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:54:38 -0400
Subject: [PATCH v16 07/14] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase III

Instead of emitting a separate XLOG_HEAP2_VISIBLE record for each page
that becomes all-visible in vacuum's third phase, record the
visibility map update in the already emitted
XLOG_HEAP2_PRUNE_VACUUM_CLEANUP record.

Visibility checks are now performed before marking dead items unused.
This is safe because the heap page is held under exclusive lock for the
entire operation.

This reduces the number of WAL records generated by VACUUM phase III by
up to 50%.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 174 +++++++++++++++++++--------
 1 file changed, 124 insertions(+), 50 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 39526bf608f..cf1c2efc999 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,6 +463,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2685,8 +2692,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2697,6 +2706,31 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	/*
+	 * Before marking dead items unused, check whether the page will become
+	 * all-visible once that change is applied. This lets us reap the tuples
+	 * and mark the page all-visible within the same critical section,
+	 * enabling both changes to be emitted in a single WAL record. Since the
+	 * visibility checks may perform I/O and allocate memory, they must be
+	 * done outside the critical section.
+	 */
+	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+									   vacrel->cutoffs.OldestXmin,
+									   deadoffsets, num_offsets,
+									   &all_frozen, &visibility_cutoff_xid,
+									   &vacrel->offnum))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+
+		/* Take the lock on the vmbuffer before entering a critical section */
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2716,6 +2750,21 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	/*
+	 * The page is guaranteed to have had dead line pointers, so
+	 * PD_ALL_VISIBLE cannot be already set. Therefore, whenever we set the VM
+	 * bit, we must also set PD_ALL_VISIBLE. The heap page lock is held while
+	 * updating the VM to ensure consistency.
+	 */
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
+		PageSetAllVisible(page);
+		visibilitymap_set_vmbits(blkno,
+								 vmbuffer, vmflags,
+								 RelationGetRelationName(vacrel->rel));
+		conflict_xid = visibility_cutoff_xid;
+	}
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2725,11 +2774,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidBuffer,	/* vmbuffer */
-								  0,	/* vmflags */
-								  InvalidTransactionId, /* conflict_xid */
+								  vmbuffer, vmflags,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
-								  false,	/* set_pd_all_vis */
+								  (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
@@ -2737,41 +2785,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
-	 */
 	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
-	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
-								 &all_frozen,
-								 &visibility_cutoff_xid,
-								 &vacrel->offnum))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  flags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
@@ -3440,18 +3459,8 @@ dead_items_cleanup(LVRelState *vacrel)
 }
 
 /*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
- *
- * *logging_offnum will have the OffsetNumber of the current tuple being
- * processed for vacuum's error callback system.
- *
- * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
- * you change anything here, make sure that everything stays in sync.  Note
- * that an assertion calls us to verify that everybody still agrees.  Be sure
- * to avoid introducing new side-effects here.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
@@ -3460,15 +3469,74 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
+
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+
+/*
+ * Check whether the heap page in buf is all-visible except for the dead
+ * tuples referenced in the deadoffsets array.
+ *
+ * The visibility checks may perform IO and allocate memory so they must not
+ * be done in a critical section. This function is used by vacuum to determine
+ * if the page will be all-visible once it reaps known dead tuples. That way
+ * it can do both in the same critical section and emit a single WAL record.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * Output parameters:
+ *
+ *  - *all_frozen: true if every tuple on the page is frozen
+ *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *logging_offnum: OffsetNumber of current tuple being processed;
+ *     used by vacuum's error callback system.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This logic is closely related to heap_prune_record_unchanged_lp_normal().
+ * If you modify this function, ensure consistency with that code. An
+ * assertion cross-checks that both remain in agreement. Do not introduce new
+ * side-effects.
+ */
+static bool
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+							   TransactionId OldestXmin,
+							   OffsetNumber *deadoffsets,
+							   int ndeadoffsets,
+							   bool *all_frozen,
+							   TransactionId *visibility_cutoff_xid,
+							   OffsetNumber *logging_offnum)
+{
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	int			matched_dead_count = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
 
+	Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+	/* Confirm input deadoffsets[] is strictly sorted */
+	if (ndeadoffsets > 1)
+	{
+		for (int i = 1; i < ndeadoffsets; i++)
+			Assert(deadoffsets[i - 1] < deadoffsets[i]);
+	}
+#endif
+
 	maxoff = PageGetMaxOffsetNumber(page);
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff && all_visible;
@@ -3496,9 +3564,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			if (!deadoffsets ||
+				matched_dead_count >= ndeadoffsets ||
+				deadoffsets[matched_dead_count] != offnum)
+			{
+				*all_frozen = all_visible = false;
+				break;
+			}
+			matched_dead_count++;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
-- 
2.43.0



  [text/x-patch] v16-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.5K, 9-v16-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 3e79e84930ba110a0dbf4abe6b3c84f3c021c78a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v16 08/14] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cf1c2efc999..cf9de40ff3c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1877,9 +1877,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1896,13 +1899,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 RelationGetRelationName(vacrel->rel));
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  true, /* set_pd_all_vis */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v16-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (26.4K, 10-v16-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From d32451ace53d97e8e11deb12c87655c6e937ee0d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v16 09/14] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |  18 ++-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 110 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  28 +---
 src/include/access/visibilitymap.h       |  15 +--
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 56 insertions(+), 377 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7f354caec31..14a2996b9ee 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 RelationGetRelationName(relation));
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  RelationGetRelationName(relation));
 		}
 
 		/*
@@ -8798,50 +8798,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 911416bbc56..69d1f0b8633 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -258,7 +258,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 
 		/* We don't have relation name during recovery, so use relfilenode */
 		relname = psprintf("%u", rlocator.relNumber);
-		old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
+		old_vmbits = visibilitymap_set(blkno, vmbuffer, vmflags, relname);
 
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
@@ -276,142 +276,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -789,8 +653,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -805,11 +669,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 		/* We don't have relation name during recovery, so use relfilenode */
 		relname = psprintf("%u", rlocator.relNumber);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 relname);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  relname);
 
 		PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		pfree(relname);
@@ -1390,9 +1254,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f384d74416a..142781d0008 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1030,9 +1030,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		{
 			Assert(PageIsAllVisible(page));
 
-			old_vmbits = visibilitymap_set_vmbits(blockno,
-												  vmbuffer, new_vmbits,
-												  RelationGetRelationName(params->relation));
+			old_vmbits = visibilitymap_set(blockno,
+										   vmbuffer, new_vmbits,
+										   RelationGetRelationName(params->relation));
 			if (old_vmbits == new_vmbits)
 			{
 				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
@@ -2309,14 +2309,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ *   all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cf9de40ff3c..bed77af23a2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1899,11 +1899,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 RelationGetRelationName(vacrel->rel));
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  RelationGetRelationName(vacrel->rel));
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2783,9 +2783,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
 	{
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 RelationGetRelationName(vacrel->rel));
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  RelationGetRelationName(vacrel->rel));
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 738105eb97e..dfa6113f0a9 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,107 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
 
 /*
  * Set visibility map (VM) flags in the block referenced by vmBuf.
@@ -343,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * heapRelname is used only for debugging.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const char *heapRelname)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const char *heapRelname)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 1cb44ca32d3..93505cb8c56 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -460,9 +453,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..e9e77bd678b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 3dcf37ba03f..859e5795457 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
@@ -31,15 +30,11 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const char *heapRelname);
+
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const char *heapRelname);
+
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8a626d633d5..48eb3cf4466 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4272,7 +4272,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v16-0010-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (8.2K, 11-v16-0010-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From 1e4108e0c5b007fe55f12c29f4a47247ba023ef9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v16 10/14] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.

The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 16 ++++++++--------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 17 ++++++++---------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 142781d0008..78e04f1d17c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -233,7 +233,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -729,9 +729,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1154,11 +1154,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1616,7 +1616,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				/*
 				 * For now always use prstate->cutoffs for this test, because
 				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
+				 * could use GlobalVisXidVisibleToAll() instead, if a
 				 * non-freezing caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 }
 
 /*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
  *
  * It is crucial that this only gets called for xids from a source that
  * protects against xid wraparounds (e.g. from a table and thus protected by
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
  */
 bool
 GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v16-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (10.5K, 12-v16-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From a28aef72286f446c53614621ebe7f8b65ee4b59b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v16 11/14] Use GlobalVisState in vacuum to determine page
 level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum directly: GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
In the rare case that it moves backward, VACUUM falls back to OldestXmin
to ensure we don’t attempt to freeze a dead tuple that wasn’t yet
prunable according to the GlobalVisState.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++++
 src/backend/access/heap/pruneheap.c         | 37 ++++++++++-----------
 src/backend/access/heap/vacuumlazy.c        | 17 +++++-----
 src/include/access/heapam.h                 |  7 ++--
 4 files changed, 57 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 78e04f1d17c..e5b16bd2b38 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -711,11 +711,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -911,6 +912,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update the hint
@@ -1081,10 +1092,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		if (!heap_page_is_all_visible(params->relation, buffer,
-									  prstate.cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1613,19 +1623,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisXidVisibleToAll() instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index bed77af23a2..3af8a359e42 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2739,7 +2739,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3488,14 +3488,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3514,7 +3513,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3533,7 +3532,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3605,7 +3604,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		tuple.t_len = ItemIdGetLength(itemid);
 		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3624,7 +3623,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2de39ba0cd1..df0632aebc6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -276,8 +276,7 @@ typedef struct PruneFreezeParams
 	/*
 	 * cutoffs contains the freeze cutoffs, established by VACUUM at the
 	 * beginning of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE
-	 * option is set. cutoffs->OldestXmin is also used to determine if dead
-	 * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+	 * option is set.
 	 */
 	struct VacuumCutoffs *cutoffs;
 } PruneFreezeParams;
@@ -443,7 +442,7 @@ extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -455,6 +454,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v16-0012-Inline-TransactionIdFollows-Precedes.patch (5.0K, 13-v16-0012-Inline-TransactionIdFollows-Precedes.patch)
  download | inline diff:
From 9ed00b821b89276c80382bc810e6a3368cc35521 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v16 12/14] Inline TransactionIdFollows/Precedes()

Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/transam/transam.c | 64 -------------------------
 src/include/access/transam.h         | 70 ++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
-	/*
-	 * If either ID is a permanent XID then we can just do unsigned
-	 * comparison.  If both are normal, do a modulo-2^32 comparison.
-	 */
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 < id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 <= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 > id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 >= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff >= 0);
-}
-
 
 /*
  * TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
 } TransamVariablesData;
 
 
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+	/*
+	 * If either ID is a permanent XID then we can just do unsigned
+	 * comparison.  If both are normal, do a modulo-2^32 comparison.
+	 */
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 < id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 <= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 > id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 >= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff >= 0);
+}
+
+
 /* ----------------
  *		extern declarations
  * ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
 extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
-- 
2.43.0



  [text/x-patch] v16-0014-Set-pd_prune_xid-on-insert.patch (6.7K, 14-v16-0014-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From bd82158f3836798a6ea9194e70e33b93980fbbde Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v16 14/14] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6181e355aaf..1704269715e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 69d1f0b8633..51f7961075f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -475,6 +475,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -624,9 +630,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



  [text/x-patch] v16-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch (27.9K, 15-v16-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 13ff9fd8071f9b7aea07cca603c51a9a3cd659f1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v16 13/14] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c              | 15 +++-
 src/backend/access/heap/heapam_handler.c      | 15 +++-
 src/backend/access/heap/pruneheap.c           | 71 +++++++++++++++----
 src/backend/access/index/indexam.c            | 46 ++++++++++++
 src/backend/access/table/tableam.c            | 39 ++++++++--
 src/backend/executor/execMain.c               |  4 ++
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  7 +-
 src/backend/executor/nodeIndexscan.c          | 18 +++--
 src/backend/executor/nodeSeqscan.c            | 24 +++++--
 src/include/access/genam.h                    | 11 +++
 src/include/access/heapam.h                   | 24 ++++++-
 src/include/access/relscan.h                  |  6 ++
 src/include/access/tableam.h                  | 30 +++++++-
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  3 +-
 16 files changed, 282 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14a2996b9ee..6181e355aaf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e5b16bd2b38..fa3b38cdadc 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -186,7 +186,9 @@ static bool heap_page_will_set_vis(Relation relation,
 								   Buffer heap_buf,
 								   Buffer vmbuffer,
 								   bool blk_known_av,
-								   const PruneState *prstate,
+								   PruneReason reason,
+								   bool do_prune, bool do_freeze,
+								   PruneState *prstate,
 								   uint8 *vmflags,
 								   bool *do_set_pd_vis);
 
@@ -201,9 +203,13 @@ static bool heap_page_will_set_vis(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -269,12 +275,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeParams params;
 			PruneFreezeResult presult;
 
+			params.options = 0;
+			params.vmbuffer = InvalidBuffer;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			params.relation = relation;
 			params.buffer = buffer;
 			params.reason = PRUNE_ON_ACCESS;
 			params.vistest = vistest;
 			params.cutoffs = NULL;
-			params.vmbuffer = InvalidBuffer;
 			params.blk_known_av = false;
 
 			/*
@@ -455,6 +470,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * have examined this page’s VM bits (e.g., VACUUM in the previous
  * heap_vac_scan_next_block() call) and can pass that along.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with the desired
  * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
  * should be set on the heap page.
@@ -465,7 +483,9 @@ heap_page_will_set_vis(Relation relation,
 					   Buffer heap_buf,
 					   Buffer vmbuffer,
 					   bool blk_known_av,
-					   const PruneState *prstate,
+					   PruneReason reason,
+					   bool do_prune, bool do_freeze,
+					   PruneState *prstate,
 					   uint8 *vmflags,
 					   bool *do_set_pd_vis)
 {
@@ -481,6 +501,23 @@ heap_page_will_set_vis(Relation relation,
 		return false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+	{
+		prstate->all_visible = prstate->all_frozen = false;
+		return false;
+	}
+
 	if (prstate->all_visible && !PageIsAllVisible(heap_page))
 		*do_set_pd_vis = true;
 
@@ -504,6 +541,9 @@ heap_page_will_set_vis(Relation relation,
 	 * page-level bit is clear.  However, it's possible that in vacuum the bit
 	 * got cleared after heap_vac_scan_next_block() was called, so we must
 	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 *
+	 * XXX: This will never trigger for on-access pruning because it passes
+	 * blk_known_av as false. Should we remove that condition here?
 	 */
 	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
 			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -912,6 +952,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * Even if we don't prune anything, if we found a new value for the
+	 * pd_prune_xid field or the page was marked full, we will update the hint
+	 * bit.
+	 */
+	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+		PageIsFull(page);
+
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
@@ -922,14 +970,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
-	/*
-	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
-	 */
-	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
-		PageIsFull(page);
-
 	/*
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
@@ -973,6 +1013,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	do_set_vm = heap_page_will_set_vis(params->relation,
 									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   params->reason, do_prune, do_freeze,
 									   &prstate, &new_vmbits, &do_set_pd_vis);
 
 	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -2245,7 +2286,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 
 /*
  * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
- * record.
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
  */
 static TransactionId
 get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
@@ -2314,8 +2355,8 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
  * all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..8d582a8eafd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 5e41404937e..3e3a0f72a71 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 831c55ce787..15be318fd41 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..aa2112c8e04 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -178,6 +178,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -204,6 +209,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index df0632aebc6..59d8ce9ad42 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -415,7 +432,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..f250d4e7aec 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -882,6 +884,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -919,10 +940,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a36653c37f9..9c54fa06e4a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-10-08 22:54  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 3 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-10-08 22:54 UTC (permalink / raw)
  To: Robert Haas <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Oct 6, 2025 at 6:40 PM Melanie Plageman
<[email protected]> wrote:
>
> In attached v16, I’ve reverted to removing XLOG_HEAP2_VISIBLE
> entirely, rather than first removing each caller's heap page from the
> VM WAL chain. I reordered changes and squashed several refactoring
> patches to improve patch-by-patch readability. This should make the
> set read differently from earlier versions that removed
> XLOG_HEAP2_VISIBLE and had more step-by-step mechanical refactoring.
>
> I think if we plan to go all the way with removing XLOG_HEAP2_VISIBLE,
> having intermediate patches that just set PD_ALL_VISIBLE when making
> other heap pages are more confusing than helpful. Also, I think having
> separate flags for setting PD_ALL_VISIBLE in the WAL record
> over-complicated the code.

I decided to reorder the patches to remove XLOG_HEAP2_VISIBLE from
vacuum phase III before removing it from vacuum phase I because
removing it from phase III doesn't require preliminary refactoring
patches. I've done that in the attached v17.

I've also added an experimental patch on the end that refactors large
chunks of heap_page_prune_and_freeze() into helpers. I got some
feedback off-list that heap_page_prune_and_freeze() is too unwieldy
now. I'm not sure how I feel about them yet, so I haven't documented
them or moved them up in the patch set to before changes to
heap_page_prune_and_freeze().

0001: Eliminate XLOG_HEAP2_VISIBLE from COPY FREEZE
0002: Eliminate XLOG_HEAP2_VISIBLE from phase III of vacuum
0003 - 0006: cleanup and refactoring to prepare for 0007
0007: Eliminate XLOG_HEAP2_VISIBLE from vacuum prune/freeze
0008 - 0009: Remove XLOG_HEAP2_VISIBLE
0010 - 0012: refactoring to prepare for 0013
0013: Set VM on-access
0014: Set pd_prune_xid on insert
0015: Experimental refactoring of heap_page_prune_and_freeze into helpers

- Melanie


Attachments:

  [text/x-patch] v17-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patch (12.0K, 2-v17-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patch)
  download | inline diff:
From 5c94a9cea77820235f62719b9e760adb6fbbc615 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v17 01/15] Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE

Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting
bits in the VM, specify the VM block changes in the
XLOG_HEAP2_MULTI_INSERT record.

This halves the number of WAL records emitted by COPY FREEZE.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c        | 44 ++++++++++------
 src/backend/access/heap/heapam_xlog.c   | 52 ++++++++++++++++++-
 src/backend/access/heap/visibilitymap.c | 68 ++++++++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c  |  5 ++
 src/include/access/visibilitymap.h      |  3 ++
 5 files changed, 154 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ed0c0c2dc9f..7f354caec31 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
 
 		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+		{
 			all_frozen_set = true;
+			/* Lock the vmbuffer before entering the critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		}
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
@@ -2506,7 +2510,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		 * going to add further frozen rows to it.
 		 *
 		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
+		 * page, mark it as all-frozen and update the visibility map. We're
+		 * already holding a pin on the vmbuffer.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2517,7 +2522,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
 		else if (all_frozen_set)
+		{
 			PageSetAllVisible(page);
+			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 RelationGetRelationName(relation));
+		}
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2637,26 +2658,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+			{
+				Assert(BufferIsDirty(vmbuffer));
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
+			}
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
-		 */
 		if (all_frozen_set)
-		{
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..c2c7e6ab086 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -662,6 +663,55 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Read and update the visibility map (VM) block.
+	 *
+	 * We must always redo VM changes, even if the corresponding heap page
+	 * update was skipped due to the LSN interlock. Each VM block covers
+	 * multiple heap pages, so later WAL records may update other bits in the
+	 * same block. If this record includes a full-page image (FPI), subsequent
+	 * WAL records may depend on it to guard against torn pages.
+	 *
+	 * Heap page changes are replayed first to preserve the invariant:
+	 * PD_ALL_VISIBLE must be set on the heap page if the VM bit is set.
+	 *
+	 * Note that we released the heap page lock above. Under normal operation,
+	 * this would be unsafe — a concurrent modification could clear
+	 * PD_ALL_VISIBLE while the VM bit remained set, violating the invariant.
+	 *
+	 * During recovery, however, no concurrent writers exist. Therefore,
+	 * updating the VM without holding the heap page lock is safe enough. This
+	 * same approach is taken when replaying xl_heap_visible records (see
+	 * heap_xlog_visible()).
+	 */
+	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		char	   *relname;
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		/* We don't have relation name during recovery, so use relfilenode */
+		relname = psprintf("%u", rlocator.relNumber);
+		visibilitymap_set_vmbits(blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN,
+								 relname);
+
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		pfree(relname);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..2d43147ffb7 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
+ *		visibilitymap_set_vmbits - set bit(s) in a pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -321,6 +322,71 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set visibility map (VM) flags in the block referenced by vmBuf.
+ *
+ * This function is intended for callers that log VM changes together
+ * with the heap page modifications that rendered the page all-visible.
+ * Callers that log VM changes separately should use visibilitymap_set().
+ *
+ * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
+ * corresponding to heapBlk.
+ *
+ * In normal operation (not recovery), this must be called inside a critical
+ * section that also applies the necessary heap page changes and, if
+ * applicable, emits WAL.
+ *
+ * The caller is responsible for ensuring consistency between the heap page
+ * and the VM page by holding a pin and exclusive lock on the buffer
+ * containing heapBlk.
+ *
+ * heapRelname is used only for debugging.
+ */
+uint8
+visibilitymap_set_vmbits(BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags,
+						 const char *heapRelname)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+		 flags, heapRelname, heapBlk);
+#endif
+
+	/* Call in same critical section where WAL is emitted. */
+	Assert(InRecovery || CritSectionCount > 0);
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	Assert(BufferIsExclusiveLocked(vmBuf));
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..3dcf37ba03f 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,9 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags,
+									  const char *heapRelname);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v17-0002-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (24.9K, 3-v17-0002-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From bb2a4c2d6800cd06cc804847b5862f36d8080617 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 8 Oct 2025 15:38:53 -0400
Subject: [PATCH v17 02/15] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase III

Instead of emitting a separate XLOG_HEAP2_VISIBLE record for each page
that becomes all-visible in vacuum's third phase, record the
visibility map update in the already emitted
XLOG_HEAP2_PRUNE_VACUUM_CLEANUP record.

Visibility checks are now performed before marking dead items unused.
This is safe because the heap page is held under exclusive lock for the
entire operation.

This reduces the number of WAL records generated by VACUUM phase III by
up to 50%.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c  | 139 +++++++++++++++++-----
 src/backend/access/heap/pruneheap.c    |  56 ++++++++-
 src/backend/access/heap/vacuumlazy.c   | 153 ++++++++++++++++++-------
 src/backend/access/rmgrdesc/heapdesc.c |  11 +-
 src/include/access/heapam.h            |   1 +
 src/include/access/heapam_xlog.h       |  17 ++-
 6 files changed, 302 insertions(+), 75 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index c2c7e6ab086..aaf595e75d6 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
+	Size		freespace = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,22 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
+	if (xlrec.flags & XLHP_VM_ALL_VISIBLE)
+	{
+		vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (xlrec.flags & XLHP_VM_ALL_FROZEN)
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+	}
+
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. That means we
+	 * can't replay this record if it removes tuples that are still visible to
+	 * transactions on the standby, freeze tuples with xids that are still
+	 * considered running on the standby, or set a page as all-visible in the
+	 * VM if it isn't all-visible to all transactions on the standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	}
 
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -90,6 +103,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
+		bool		do_prune;
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
 											   &nplans, &plans, &frz_offsets,
@@ -97,11 +111,16 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,36 +157,104 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		if ((vmflags & VISIBILITYMAP_VALID_BITS))
+			PageSetAllVisible(page);
+
+		MarkBufferDirty(buffer);
+
+		/*
+		 * Always emit a WAL record when setting PD_ALL_VISIBLE but only emit
+		 * an FPI if checksums/wal_log_hints are enabled. Advance the page LSN
+		 * only if the record could include an FPI, since recovery skips
+		 * records <= the stamped LSN. Otherwise it might skip an earlier FPI
+		 * needed to repair a torn page.
+		 */
+		if (do_prune || nplans > 0 ||
+			((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+			PageSetLSN(page, lsn);
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
-
-		PageSetLSN(page, lsn);
-		MarkBufferDirty(buffer);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or set PD_ALL_VISIBLE or the
+	 * VM, update the freespace map.
+	 *
+	 * Even when no actual space is freed (e.g., when only marking the page
+	 * all-visible or frozen), we still update the FSM. Because the FSM is
+	 * unlogged and maintained heuristically, it often becomes stale on
+	 * standbys. If such a standby is later promoted and runs VACUUM, it will
+	 * skip recalculating free space for pages that were marked all-visible
+	 * (or all-frozen, depending on the mode). FreeSpaceMapVacuum can then
+	 * propagate overly optimistic free space values upward, causing future
+	 * insertions to select pages that turn out to be unusable. In bulk, this
+	 * can lead to long stalls.
+	 *
+	 * To prevent this, always refresh the FSM’s view when a page becomes
+	 * all-visible or all-frozen.
+	 *
+	 * Do this regardless of whether a full-page image is logged, since FSM
+	 * data is not part of the page itself.
 	 *
-	 * Do this regardless of a full-page image being applied, since the FSM
-	 * data is not in the page anyway.
 	 */
 	if (BufferIsValid(buffer))
 	{
-		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
-						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
-		{
-			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+		if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+							XLHP_HAS_DEAD_ITEMS |
+							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+			(vmflags & VISIBILITYMAP_VALID_BITS))
+			freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+
+		/*
+		 * We want to avoid holding an exclusive lock on the heap buffer while
+		 * doing IO (either of the FSM or the VM), so we'll release the lock
+		 * on the heap buffer before doing either.
+		 */
+		UnlockReleaseBuffer(buffer);
+	}
+
+	/*
+	 * Now read and update the VM block.
+	 *
+	 * We must redo changes to the VM even if the heap page was skipped due to
+	 * LSN interlock. See comment in heap_xlog_multi_insert() for more details
+	 * on replaying changes to the VM.
+	 */
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		char	   *relname;
+		uint8		old_vmbits = 0;
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
 
-			UnlockReleaseBuffer(buffer);
+		/* We don't have relation name during recovery, so use relfilenode */
+		relname = psprintf("%u", rlocator.relNumber);
+		old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
 
-			XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+		{
+			Assert(BufferIsDirty(vmbuffer));
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		}
-		else
-			UnlockReleaseBuffer(buffer);
+		pfree(relname);
 	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
+	if (freespace > 0)
+		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d8ea0c78f77..9052e1a584c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,6 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
+#include "access/visibilitymapdefs.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -835,6 +836,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				conflict_xid = prstate.latest_xid_removed;
 
 			log_heap_prune_and_freeze(relation, buffer,
+									  InvalidBuffer,	/* vmbuffer */
+									  0,	/* vmflags */
 									  conflict_xid,
 									  true, reason,
 									  prstate.frozen, prstate.nfrozen,
@@ -2045,12 +2048,17 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer, uint8 vmflags,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
 						  PruneReason reason,
@@ -2062,6 +2070,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2079,24 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+	bool		do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
 
 	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 * Note that if we explicitly skip an FPI, we must not set the heap page
+	 * LSN later.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!do_set_vm || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (do_set_vm)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2136,6 +2165,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Prepare the main xl_heap_prune record.  We already set the XLHP_HAS_*
 	 * flag above.
 	 */
+	if (vmflags & VISIBILITYMAP_ALL_VISIBLE)
+	{
+		xlrec.flags |= XLHP_VM_ALL_VISIBLE;
+		if (vmflags & VISIBILITYMAP_ALL_FROZEN)
+			xlrec.flags |= XLHP_VM_ALL_FROZEN;
+	}
 	if (RelationIsAccessibleInLogicalDecoding(relation))
 		xlrec.flags |= XLHP_IS_CATALOG_REL;
 	if (TransactionIdIsValid(conflict_xid))
@@ -2168,5 +2203,22 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	if (do_set_vm)
+	{
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+	}
+
+	/*
+	 * We must bump the page LSN if pruning or freezing. If we are only
+	 * updating PD_ALL_VISIBLE, though, we can skip doing this unless
+	 * wal_log_hints/checksums are enabled. Torn pages are possible if we
+	 * update PD_ALL_VISIBLE without bumping the LSN, but this is deemed okay
+	 * for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+	{
+		Assert(BufferIsDirty(buffer));
+		PageSetLSN(BufferGetPage(buffer), recptr);
+	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ab6938d1da1..dfe617a914f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,6 +465,11 @@ static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2848,8 +2853,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2860,6 +2867,29 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	/*
+	 * Before marking dead items unused, check whether the page will become
+	 * all-visible once that change is applied. This lets us reap the tuples
+	 * and mark the page all-visible within the same critical section,
+	 * enabling both changes to be emitted in a single WAL record. Since the
+	 * visibility checks may perform I/O and allocate memory, they must be
+	 * done outside the critical section.
+	 */
+	if (heap_page_would_be_all_visible(vacrel, buffer,
+									   deadoffsets, num_offsets,
+									   &all_frozen, &visibility_cutoff_xid))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+
+		/* Take the lock on the vmbuffer before entering a critical section */
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2879,6 +2909,21 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	/*
+	 * The page is guaranteed to have had dead line pointers, so
+	 * PD_ALL_VISIBLE cannot be already set. Therefore, whenever we set the VM
+	 * bit, we must also set PD_ALL_VISIBLE. The heap page lock is held while
+	 * updating the VM to ensure consistency.
+	 */
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
+		PageSetAllVisible(page);
+		visibilitymap_set_vmbits(blkno,
+								 vmbuffer, vmflags,
+								 RelationGetRelationName(vacrel->rel));
+		conflict_xid = visibility_cutoff_xid;
+	}
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2888,7 +2933,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  vmbuffer, vmflags,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
@@ -2897,39 +2943,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
-	 */
 	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
-	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  flags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
@@ -3598,30 +3617,74 @@ dead_items_cleanup(LVRelState *vacrel)
 }
 
 /*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
- *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
  */
 static bool
 heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 						 TransactionId *visibility_cutoff_xid,
 						 bool *all_frozen)
 {
+
+	return heap_page_would_be_all_visible(vacrel, buf,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid);
+}
+
+/*
+ * Check whether the heap page in buf is all-visible except for the dead
+ * tuples referenced in the deadoffsets array.
+ *
+ * The visibility checks may perform IO and allocate memory so they must not
+ * be done in a critical section. This function is used by vacuum to determine
+ * if the page will be all-visible once it reaps known dead tuples. That way
+ * it can do both in the same critical section and emit a single WAL record.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
+ *
+ * Output parameters:
+ *
+ *  - *all_frozen: true if every tuple on the page is frozen
+ *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This logic is closely related to heap_prune_record_unchanged_lp_normal().
+ * If you modify this function, ensure consistency with that code. An
+ * assertion cross-checks that both remain in agreement. Do not introduce new
+ * side-effects.
+ */
+static bool
+heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+							   OffsetNumber *deadoffsets,
+							   int ndeadoffsets,
+							   bool *all_frozen,
+							   TransactionId *visibility_cutoff_xid)
+{
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	int			matched_dead_count = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
 
+	Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+	/* Confirm input deadoffsets[] is strictly sorted */
+	if (ndeadoffsets > 1)
+	{
+		for (int i = 1; i < ndeadoffsets; i++)
+			Assert(deadoffsets[i - 1] < deadoffsets[i]);
+	}
+#endif
+
 	maxoff = PageGetMaxOffsetNumber(page);
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff && all_visible;
@@ -3649,9 +3712,15 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			if (!deadoffsets ||
+				matched_dead_count >= ndeadoffsets ||
+				deadoffsets[matched_dead_count] != offnum)
+			{
+				*all_frozen = all_visible = false;
+				break;
+			}
+			matched_dead_count++;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..1cb44ca32d3 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
  * code, the latter of which is used in frontend (pg_waldump) code.
  */
 void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 									   int *nplans, xlhp_freeze_plan **plans,
 									   OffsetNumber **frz_offsets,
 									   int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & XLHP_VM_ALL_VISIBLE)
+		{
+			uint8		vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			if (xlrec->flags & XLHP_VM_ALL_FROZEN)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+		}
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e60d34dad25..8cbff6ab0eb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -382,6 +382,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer, uint8 vmflags,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
 									  PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..16c2b2e3c9c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
  * Main data section:
  *
  *	xl_heap_prune
- *		uint8				flags
+ *		uint16				flags
  *	TransactionId			snapshot_conflict_horizon
  *
  * Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
  */
 typedef struct xl_heap_prune
 {
-	uint8		flags;
+	uint16		flags;
 
 	/*
 	 * If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,7 +292,7 @@ typedef struct xl_heap_prune
 	 */
 } xl_heap_prune;
 
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
 /* to handle recovery conflict during logical decoding on standby */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
@@ -330,6 +330,15 @@ typedef struct xl_heap_prune
 #define		XLHP_HAS_DEAD_ITEMS	        (1 << 6)
 #define		XLHP_HAS_NOW_UNUSED_ITEMS   (1 << 7)
 
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set.
+ * xl_heap_prune should always use the XLHP_VM_ALL_VISIBLE and
+ * XLHP_VM_ALL_FROZEN flags and translate them to their visibilitymapdefs.h
+ * equivalents, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN.
+ */
+#define		XLHP_VM_ALL_VISIBLE			(1 << 8)
+#define		XLHP_VM_ALL_FROZEN			(1 << 9)
+
 /*
  * xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
  * (appears in xl_heap_prune's xlhp_freeze_plans sub-record)
@@ -497,7 +506,7 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
 								   uint8 vmflags);
 
 /* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
 												   OffsetNumber **frz_offsets,
 												   int *nredirected, OffsetNumber **redirected,
-- 
2.43.0



  [text/x-patch] v17-0003-Assorted-trivial-heap_page_prune_and_freeze-clea.patch (15.6K, 4-v17-0003-Assorted-trivial-heap_page_prune_and_freeze-clea.patch)
  download | inline diff:
From 6b5fc27f0d80bab1df86a2e6fb51b64fd20c3cbb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 12:06:19 -0400
Subject: [PATCH v17 03/15] Assorted trivial heap_page_prune_and_freeze cleanup

Group heap_page_prune_and_freeze() input parameters in a struct and
clean up their documentation.

Rename a member of PruneState and disambiguate some local
heap_page_prune_and_freeze() variables.
---
 src/backend/access/heap/pruneheap.c  | 112 +++++++++++++--------------
 src/backend/access/heap/vacuumlazy.c |  16 ++--
 src/include/access/heapam.h          |  62 ++++++++++++---
 src/tools/pgindent/typedefs.list     |   1 +
 4 files changed, 114 insertions(+), 77 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9052e1a584c..be42d3c3272 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
 	/* whether or not dead items can be set LP_UNUSED during pruning */
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
-	bool		freeze;
+	bool		attempt_freeze;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -254,15 +254,23 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
 		{
 			OffsetNumber dummy_off_loc;
+			PruneFreezeParams params;
 			PruneFreezeResult presult;
 
+			params.relation = relation;
+			params.buffer = buffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
+
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+			params.options = 0;
+
+			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -304,60 +312,43 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
- * required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
- *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
  *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
- * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
- * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
+ * it's required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now.  The
+ * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
+ * arguments are required when freezing.  When HEAP_PRUNE_FREEZE option is
+ * passed, we also set presult->all_visible and presult->all_frozen on exit,
+ * to indicate if the VM bits can be set.  They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
+ * callers that also freeze need that information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
  * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set.  On entry, they contain the oldest XID and
- * multi-XID seen on the relation so far.  They will be updated with oldest
- * values present on the page after pruning.  After processing the whole
- * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
- * for the relation.
+ * HEAP_PRUNE_FREEZE option is set in params.  On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far.  They will be updated
+ * with oldest values present on the page after pruning.  After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
  */
 void
-heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-						   GlobalVisState *vistest,
-						   int options,
-						   struct VacuumCutoffs *cutoffs,
+heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   PruneFreezeResult *presult,
-						   PruneReason reason,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
 {
+	Buffer		buffer = params->buffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	OffsetNumber offnum,
@@ -366,15 +357,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	HeapTupleData tup;
 	bool		do_freeze;
 	bool		do_prune;
-	bool		do_hint;
-	bool		hint_bit_fpi;
+	bool		do_hint_prune;
+	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
 	/* Copy parameters to prstate */
-	prstate.vistest = vistest;
-	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
-	prstate.cutoffs = cutoffs;
+	prstate.vistest = params->vistest;
+	prstate.mark_unused_now =
+		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.cutoffs = params->cutoffs;
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -395,7 +387,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/* initialize page freezing working state */
 	prstate.pagefrz.freeze_required = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
 		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -442,7 +434,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * function, when we return the value to the caller, so that the caller
 	 * doesn't set the VM bit incorrectly.
 	 */
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -468,7 +460,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
 	maxoff = PageGetMaxOffsetNumber(page);
-	tup.t_tableOid = RelationGetRelid(relation);
+	tup.t_tableOid = RelationGetRelid(params->relation);
 
 	/*
 	 * Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -556,7 +548,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
 	 * an FPI to be emitted.
 	 */
-	hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	/*
 	 * Process HOT chains.
@@ -664,7 +656,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * pd_prune_xid field or the page was marked full, we will update the hint
 	 * bit.
 	 */
-	do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
 	/*
@@ -672,7 +664,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * plans we prepared, or not.
 	 */
 	do_freeze = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (prstate.pagefrz.freeze_required)
 		{
@@ -703,16 +695,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				 * Freezing would make the page all-frozen.  Have already
 				 * emitted an FPI or will do so anyway?
 				 */
-				if (RelationNeedsWAL(relation))
+				if (RelationNeedsWAL(params->relation))
 				{
-					if (hint_bit_fpi)
+					if (did_tuple_hint_fpi)
 						do_freeze = true;
 					else if (do_prune)
 					{
 						if (XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
 					}
-					else if (do_hint)
+					else if (do_hint_prune)
 					{
 						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
@@ -754,7 +746,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
-	if (do_hint)
+	if (do_hint_prune)
 	{
 		/*
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
@@ -797,7 +789,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		/*
 		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
 		 */
-		if (RelationNeedsWAL(relation))
+		if (RelationNeedsWAL(params->relation))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -835,11 +827,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			else
 				conflict_xid = prstate.latest_xid_removed;
 
-			log_heap_prune_and_freeze(relation, buffer,
+			log_heap_prune_and_freeze(params->relation, buffer,
 									  InvalidBuffer,	/* vmbuffer */
 									  0,	/* vmflags */
 									  conflict_xid,
-									  true, reason,
+									  true, params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -897,7 +889,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
 		{
@@ -1479,7 +1471,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 	}
 
 	/* Consider freezing any normal tuples which will not be removed */
-	if (prstate->freeze)
+	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index dfe617a914f..b25050d6773 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1956,10 +1956,16 @@ lazy_scan_prune(LVRelState *vacrel,
 {
 	Relation	rel = vacrel->rel;
 	PruneFreezeResult presult;
-	int			prune_options = 0;
+	PruneFreezeParams params;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
+	params.relation = rel;
+	params.buffer = buf;
+	params.reason = PRUNE_VACUUM_SCAN;
+	params.cutoffs = &vacrel->cutoffs;
+	params.vistest = vacrel->vistest;
+
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
 	 *
@@ -1975,12 +1981,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	params.options = HEAP_PAGE_PRUNE_FREEZE;
 	if (vacrel->nindexes == 0)
-		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
+		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+	heap_page_prune_and_freeze(&params,
+							   &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8cbff6ab0eb..74a5c24002b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -221,6 +221,55 @@ typedef struct HeapPageFreeze
 
 } HeapPageFreeze;
 
+
+/* 'reason' codes for heap_page_prune_and_freeze() */
+typedef enum
+{
+	PRUNE_ON_ACCESS,			/* on-access pruning */
+	PRUNE_VACUUM_SCAN,			/* VACUUM 1st heap pass */
+	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
+} PruneReason;
+
+/*
+ * Input parameters to heap_page_prune_and_freeze()
+ */
+typedef struct PruneFreezeParams
+{
+	Relation	relation;		/* relation containing buffer to be pruned */
+	Buffer		buffer;			/* buffer to be pruned */
+
+	/*
+	 * The reason pruning was performed.  It is used to set the WAL record
+	 * opcode which is used for debugging and analysis purposes.
+	 */
+	PruneReason reason;
+
+	/*
+	 * Contains flag bits:
+	 *
+	 * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+	 * pruning.
+	 *
+	 * FREEZE indicates that we will also freeze tuples, and will return
+	 * 'all_visible', 'all_frozen' flags to the caller.
+	 */
+	int			options;
+
+	/*
+	 * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+	 * (see heap_prune_satisfies_vacuum).
+	 */
+	GlobalVisState *vistest;
+
+	/*
+	 * cutoffs contains the freeze cutoffs, established by VACUUM at the
+	 * beginning of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE
+	 * option is set. cutoffs->OldestXmin is also used to determine if dead
+	 * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+	 */
+	struct VacuumCutoffs *cutoffs;
+} PruneFreezeParams;
+
 /*
  * Per-page state returned by heap_page_prune_and_freeze()
  */
@@ -264,13 +313,6 @@ typedef struct PruneFreezeResult
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 } PruneFreezeResult;
 
-/* 'reason' codes for heap_page_prune_and_freeze() */
-typedef enum
-{
-	PRUNE_ON_ACCESS,			/* on-access pruning */
-	PRUNE_VACUUM_SCAN,			/* VACUUM 1st heap pass */
-	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
-} PruneReason;
 
 /* ----------------
  *		function prototypes for heap access method
@@ -367,12 +409,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-									   GlobalVisState *vistest,
-									   int options,
-									   struct VacuumCutoffs *cutoffs,
+extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
-									   PruneReason reason,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 02b5b041c45..20f45232175 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2342,6 +2342,7 @@ ProjectionPath
 PromptInterruptContext
 ProtocolVersion
 PrsStorage
+PruneFreezeParams
 PruneFreezeResult
 PruneReason
 PruneState
-- 
2.43.0



  [text/x-patch] v17-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.5K, 5-v17-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From f3dc6eda58a61482f36786dda6e2aaa22c0e0f0f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v17 08/15] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2f719108ad2..941b989ec50 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1878,9 +1878,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1897,13 +1900,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 RelationGetRelationName(vacrel->rel));
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  true, /* set_pd_all_vis */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v17-0006-Make-heap_page_is_all_visible-independent-of-LVR.patch (6.6K, 6-v17-0006-Make-heap_page_is_all_visible-independent-of-LVR.patch)
  download | inline diff:
From 86193a71d2ff9649b5b1c1e6963bd610285ad369 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 3 Oct 2025 15:57:02 -0400
Subject: [PATCH v17 06/15] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 59 ++++++++++++++++++----------
 1 file changed, 38 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 56a0286662b..c2618c6449c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,13 +463,19 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static bool heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid);
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2019,8 +2025,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2880,9 +2887,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * visibility checks may perform I/O and allocate memory, they must be
 	 * done outside the critical section.
 	 */
-	if (heap_page_would_be_all_visible(vacrel, buffer,
+	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+									   vacrel->cutoffs.OldestXmin,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid))
+									   &all_frozen, &visibility_cutoff_xid,
+									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
@@ -3626,15 +3635,19 @@ dead_items_cleanup(LVRelState *vacrel)
  * callers that expect no LP_DEAD on the page.
  */
 static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(vacrel, buf,
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid);
+										  visibility_cutoff_xid,
+										  logging_offnum);
 }
 
 /*
@@ -3649,10 +3662,14 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
+ * OldestXmin is used to determine visibility.
+ *
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
  *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *logging_offnum: OffsetNumber of current tuple being processed;
+ *     used by vacuum's error callback system.
  *
  * Callers looking to verify that the page is already all-visible can call
  * heap_page_is_all_visible().
@@ -3663,11 +3680,13 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
  * side-effects.
  */
 static bool
-heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid)
+							   TransactionId *visibility_cutoff_xid,
+							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3702,7 +3721,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3732,10 +3751,9 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
-										 buf))
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3754,8 +3772,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+					if (!TransactionIdPrecedes(xmin, OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3790,7 +3807,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
-- 
2.43.0



  [text/x-patch] v17-0004-Add-helper-for-freeze-determination-to-heap_page.patch (7.0K, 7-v17-0004-Add-helper-for-freeze-determination-to-heap_page.patch)
  download | inline diff:
From c69a5219a9b792f3c9f6dc730b8810a88d088ae6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 14:22:10 -0400
Subject: [PATCH v17 04/15] Add helper for freeze determination to
 heap_page_prune_and_freeze

After scanning through the line pointers on the heap page during
vacuum's first phase, we use several statuses and information we
collected to determine whether or not we will use the freeze plans we
assembled.

Do this in a helper for better readability.
---
 src/backend/access/heap/pruneheap.c | 196 +++++++++++++++++-----------
 1 file changed, 117 insertions(+), 79 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index be42d3c3272..44214a57ecd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -302,6 +302,118 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans
+ * we prepared for the given heap buffer or not. If the caller specified we
+ * should not freeze tuples, it exits early. Otherwise, it does a few
+ * pre-freeze checks.
+ *
+ * do_prune, do_hint_prune, and did_tuple_hint_fpi must all have been decided
+ * before calling this function.
+ *
+ * prstate is an input/output parameter.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool did_tuple_hint_fpi,
+					  bool do_prune,
+					  bool do_hint_prune,
+					  PruneState *prstate)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and exit.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_prune)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	return do_freeze;
+}
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -663,85 +775,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
 	 */
-	do_freeze = false;
-	if (prstate.attempt_freeze)
-	{
-		if (prstate.pagefrz.freeze_required)
-		{
-			/*
-			 * heap_prepare_freeze_tuple indicated that at least one XID/MXID
-			 * from before FreezeLimit/MultiXactCutoff is present.  Must
-			 * freeze to advance relfrozenxid/relminmxid.
-			 */
-			do_freeze = true;
-		}
-		else
-		{
-			/*
-			 * Opportunistically freeze the page if we are generating an FPI
-			 * anyway and if doing so means that we can set the page
-			 * all-frozen afterwards (might not happen until VACUUM's final
-			 * heap pass).
-			 *
-			 * XXX: Previously, we knew if pruning emitted an FPI by checking
-			 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze
-			 * and prune records were combined, this heuristic couldn't be
-			 * used anymore.  The opportunistic freeze heuristic must be
-			 * improved; however, for now, try to approximate the old logic.
-			 */
-			if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
-			{
-				/*
-				 * Freezing would make the page all-frozen.  Have already
-				 * emitted an FPI or will do so anyway?
-				 */
-				if (RelationNeedsWAL(params->relation))
-				{
-					if (did_tuple_hint_fpi)
-						do_freeze = true;
-					else if (do_prune)
-					{
-						if (XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-					else if (do_hint_prune)
-					{
-						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
-	}
-	else if (prstate.nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate.pagefrz.freeze_required);
-
-		prstate.all_frozen = false;
-		prstate.nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
+	do_freeze = heap_page_will_freeze(params->relation, buffer,
+									  did_tuple_hint_fpi,
+									  do_prune,
+									  do_hint_prune,
+									  &prstate);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
-- 
2.43.0



  [text/x-patch] v17-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-prune-f.patch (40.7K, 8-v17-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-prune-f.patch)
  download | inline diff:
From dde0dfc578137f7c93f9a0e34af38dcdb841b080 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 8 Oct 2025 15:39:01 -0400
Subject: [PATCH v17 07/15] Eliminate XLOG_HEAP2_VISIBLE from vacuum
 prune/freeze
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum’s prune/freeze work, not to pruning
performed during normal page access.

Reviewed-by: Robert Haas <[email protected]>
---
 src/backend/access/heap/heapam_xlog.c |  41 ++-
 src/backend/access/heap/pruneheap.c   | 429 ++++++++++++++++++++------
 src/backend/access/heap/vacuumlazy.c  | 205 +-----------
 src/include/access/heapam.h           |  41 ++-
 4 files changed, 414 insertions(+), 302 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index aaf595e75d6..f6624bc98d0 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -104,6 +104,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		OffsetNumber *frz_offsets;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 		bool		do_prune;
+		bool		set_lsn = false;
+		bool		mark_buffer_dirty = false;
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
 											   &nplans, &plans, &frz_offsets,
@@ -157,20 +159,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
-		if ((vmflags & VISIBILITYMAP_VALID_BITS))
-			PageSetAllVisible(page);
-
-		MarkBufferDirty(buffer);
+		if (do_prune || nplans > 0)
+			mark_buffer_dirty = set_lsn = true;
 
 		/*
-		 * Always emit a WAL record when setting PD_ALL_VISIBLE but only emit
-		 * an FPI if checksums/wal_log_hints are enabled. Advance the page LSN
-		 * only if the record could include an FPI, since recovery skips
-		 * records <= the stamped LSN. Otherwise it might skip an earlier FPI
-		 * needed to repair a torn page.
+		 * The critical integrity requirement here is that we must never end
+		 * up with with the visibility map bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear.  If that were to occur, a subsequent page
+		 * modification would fail to clear the visibility map bit.
+		 *
+		 * If this record only sets the VM, no need to dirty the heap page.
 		 */
-		if (do_prune || nplans > 0 ||
-			((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			mark_buffer_dirty = true;
+
+			/*
+			 * Always emit a WAL record when setting PD_ALL_VISIBLE but only
+			 * emit an FPI if checksums/wal_log_hints are enabled. Advance the
+			 * page LSN only if the record could include an FPI, since
+			 * recovery skips records <= the stamped LSN. Otherwise it might
+			 * skip an earlier FPI needed to repair a torn page.
+			 */
+			if (XLogHintBitIsNeeded())
+				set_lsn = true;
+		}
+
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+
+		if (set_lsn)
 			PageSetLSN(page, lsn);
 
 		/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5892ed5a07e..f70563008e1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -133,17 +135,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze(), to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether or not to opportunistically
-	 * freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -174,6 +176,19 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid, bool blk_already_av,
+									  bool set_blk_all_frozen);
+
+static bool heap_page_will_set_vis(Relation relation,
+								   BlockNumber heap_blk,
+								   Buffer heap_buf,
+								   Buffer vmbuffer,
+								   bool blk_known_av,
+								   const PruneState *prstate,
+								   uint8 *vmflags,
+								   bool *do_set_pd_vis);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -259,6 +274,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			params.reason = PRUNE_ON_ACCESS;
 			params.vistest = vistest;
 			params.cutoffs = NULL;
+			params.vmbuffer = InvalidBuffer;
+			params.blk_known_av = false;
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -431,10 +448,108 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneState and blk_known_av. Some callers may already
+ * have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
+ * should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+					   BlockNumber heap_blk,
+					   Buffer heap_buf,
+					   Buffer vmbuffer,
+					   bool blk_known_av,
+					   const PruneState *prstate,
+					   uint8 *vmflags,
+					   bool *do_set_pd_vis)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+	bool		do_set_vm = false;
+
+	*do_set_pd_vis = false;
+
+	if (!prstate->attempt_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	if (prstate->all_visible && !PageIsAllVisible(heap_page))
+		*do_set_pd_vis = true;
+
+	if ((prstate->all_visible && !blk_known_av) ||
+		(prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+		do_set_vm = true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that in vacuum the bit
+	 * got cleared after heap_vac_scan_next_block() was called, so we must
+	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return do_set_vm;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -449,12 +564,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * it's required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
  * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
- * arguments are required when freezing.  When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen after
- * determining whether or not to opporunistically freeze, to indicate if the
- * VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
- * that also freeze need that information.
+ * arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -479,6 +595,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	OffsetNumber offnum,
@@ -488,15 +605,22 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
+	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	TransactionId frz_conflict_horizon = InvalidTransactionId;
+	TransactionId conflict_xid = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = params->vistest;
 	prstate.mark_unused_now =
 		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
 	prstate.cutoffs = params->cutoffs;
 
 	/*
@@ -543,50 +667,54 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
+	 *
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
+	 *
+	 * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * when we encounter LP_DEAD items. Instead, we correct all_visible after
+	 * deciding whether to freeze, but before updating the VM, to avoid
+	 * setting the VM bit incorrectly.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible when we see LP_DEAD items.  We fix that after
-	 * scanning the line pointers, before we return the value to the caller,
-	 * so that the caller doesn't set the VM bit incorrectly.
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if (prstate.attempt_update_vm)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -818,6 +946,35 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+	 * based on information from the VM and the all_visible/all_frozen flags.
+	 *
+	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+	 * VM bit is clear, we strongly prefer to keep them in sync.
+	 *
+	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+	 * already been set. Setting only the VM is most common when setting an
+	 * already all-visible page all-frozen.
+	 */
+	do_set_vm = heap_page_will_set_vis(params->relation,
+									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   &prstate, &new_vmbits, &do_set_pd_vis);
+
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									prstate.latest_xid_removed, frz_conflict_horizon,
+									prstate.visibility_cutoff_xid, params->blk_known_av,
+									(do_set_vm && (new_vmbits & VISIBILITYMAP_ALL_FROZEN)));
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -838,14 +995,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_pd_vis)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -859,66 +1019,91 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (do_set_pd_vis)
+			PageSetAllVisible(page);
 
-		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
-		 */
-		if (RelationNeedsWAL(params->relation))
+		if (do_prune || do_freeze || do_set_pd_vis)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
+			Assert(PageIsAllVisible(page));
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
+			old_vmbits = visibilitymap_set_vmbits(blockno,
+												  vmbuffer, new_vmbits,
+												  RelationGetRelationName(params->relation));
+			if (old_vmbits == new_vmbits)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				/* Unset so we don't emit WAL since no change occurred */
+				do_set_vm = false;
+			}
+		}
 
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+		 * only updating the VM and it turns out it was already set, we will
+		 * have unset do_set_vm earlier. As such, check it again before
+		 * emitting the record.
+		 */
+		if (RelationNeedsWAL(params->relation) &&
+			(do_prune || do_freeze || do_set_vm))
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  do_set_pd_vis,
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
 									  prstate.nowunused, prstate.nunused);
-		}
 	}
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
+
+		if (!heap_page_is_all_visible(params->relation, buffer,
+									  prstate.cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -2060,6 +2245,64 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 	return nplans;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid, bool blk_already_av,
+				 bool set_blk_all_frozen)
+{
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	TransactionId conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune && !do_freeze &&
+		do_set_vm && blk_already_av && set_blk_all_frozen)
+		conflict_xid = InvalidTransactionId;
+
+	return conflict_xid;
+}
+
 /*
  * Write an XLOG_HEAP2_PRUNE* WAL record
  *
@@ -2084,6 +2327,10 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
@@ -2093,6 +2340,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 						  Buffer vmbuffer, uint8 vmflags,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
+						  bool set_pd_all_vis,
 						  PruneReason reason,
 						  HeapTupleFreeze *frozen, int nfrozen,
 						  OffsetNumber *redirected, int nredirected,
@@ -2127,7 +2375,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 */
 	if (!do_prune &&
 		nfrozen == 0 &&
-		(!do_set_vm || !XLogHintBitIsNeeded()))
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
@@ -2248,7 +2496,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * update PD_ALL_VISIBLE without bumping the LSN, but this is deemed okay
 	 * for page hint updates.
 	 */
-	if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
 	{
 		Assert(BufferIsDirty(buffer));
 		PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c2618c6449c..2f719108ad2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,11 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   TransactionId OldestXmin,
 										   OffsetNumber *deadoffsets,
@@ -1971,6 +1966,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	params.reason = PRUNE_VACUUM_SCAN;
 	params.cutoffs = &vacrel->cutoffs;
 	params.vistest = vacrel->vistest;
+	params.vmbuffer = vmbuffer;
+	params.blk_known_av = all_visible_according_to_vm;
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1987,7 +1984,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	params.options = HEAP_PAGE_PRUNE_FREEZE;
+	params.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS;
 	if (vacrel->nindexes == 0)
 		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
@@ -2010,33 +2007,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2070,168 +2040,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
 	return presult.ndeleted;
@@ -2950,6 +2778,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  vmbuffer, vmflags,
 								  conflict_xid,
 								  false,	/* no cleanup lock required */
+								  (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
@@ -3634,7 +3463,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Wrapper for heap_page_would_be_all_visible() which can be used for
  * callers that expect no LP_DEAD on the page.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId OldestXmin,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 74a5c24002b..2de39ba0cd1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 *
+	 * vmbuffer is the buffer that must already contain contain the required
+	 * block of the visibility map if we are to update it. blk_known_av is the
+	 * visibility status of the heap block as of the last call to
+	 * find_next_unskippable_block().
+	 */
+	Buffer		vmbuffer;
+	bool		blk_known_av;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -250,8 +261,9 @@ typedef struct PruneFreezeParams
 	 * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
 	 * pruning.
 	 *
-	 * FREEZE indicates that we will also freeze tuples, and will return
-	 * 'all_visible', 'all_frozen' flags to the caller.
+	 * FREEZE indicates that we will also freeze tuples
+	 *
+	 * UPDATE_VIS indicates that we will set the page's status in the VM.
 	 */
 	int			options;
 
@@ -284,19 +296,15 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+	 * we have attempted to update the VM.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
+	uint8		new_vmbits;
+	uint8		old_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -423,6 +431,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 									  Buffer vmbuffer, uint8 vmflags,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
+									  bool set_pd_all_vis,
 									  PruneReason reason,
 									  HeapTupleFreeze *frozen, int nfrozen,
 									  OffsetNumber *redirected, int nredirected,
@@ -433,6 +442,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v17-0005-Update-PruneState.all_-visible-frozen-earlier-in.patch (14.8K, 9-v17-0005-Update-PruneState.all_-visible-frozen-earlier-in.patch)
  download | inline diff:
From d4a4be3eed25853fc1ea84ebc2cbe0226afd823a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 16:25:44 -0400
Subject: [PATCH v17 05/15] Update PruneState.all_[visible|frozen] earlier in
 pruning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen when dead items are present. This allows opportunistic
freezing if the page would otherwise be fully frozen, since those dead
items are later removed in vacuum’s third phase.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags promptly avoids extra bookkeeping in
heap_prune_unchanged_lp_normal(). At present this has no runtime effect
because all callers that consider setting the VM also attempt freezing,
but future callers (e.g. on-access pruning) may want to set the VM
without preparing freeze plans.

We also used to defer clearing all_visible and all_frozen until after
computing the visibility cutoff XID. By determining the cutoff earlier,
we can update these flags immediately after deciding whether to
opportunistically freeze. This is necessary if we want to set the VM in
the same WAL record that prunes and freezes tuples on the page.

While we are at it, unset all_frozen whenever we unset all_visible.
Previously we could only use all_frozen in combination with all_visible
as all_frozen was not unset when not all-visible tuples were encountered.
It is best to keep them both up-to-date to avoid mistakes when using
all_frozen.
---
 src/backend/access/heap/pruneheap.c  | 144 ++++++++++++++-------------
 src/backend/access/heap/vacuumlazy.c |   9 +-
 2 files changed, 77 insertions(+), 76 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 44214a57ecd..5892ed5a07e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -138,15 +138,12 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
+	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
+	 * That's convenient for heap_page_prune_and_freeze(), to use them to
+	 * decide whether to freeze the page or not.  The all_visible and
+	 * all_frozen values returned to the caller are adjusted to include
+	 * LP_DEAD items after we determine whether or not to opportunistically
+	 * freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -309,7 +306,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * pre-freeze checks.
  *
  * do_prune, do_hint_prune, and did_tuple_hint_fpi must all have been decided
- * before calling this function.
+ * before calling this function. *frz_conflict_horizon is set to the snapshot
+ * conflict horizon we for the WAL record should we decide to freeze tuples.
  *
  * prstate is an input/output parameter.
  *
@@ -321,7 +319,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 					  bool did_tuple_hint_fpi,
 					  bool do_prune,
 					  bool do_hint_prune,
-					  PruneState *prstate)
+					  PruneState *prstate,
+					  TransactionId *frz_conflict_horizon)
 {
 	bool		do_freeze = false;
 
@@ -358,8 +357,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * anymore.  The opportunistic freeze heuristic must be improved;
 		 * however, for now, try to approximate the old logic.
 		 */
-		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		if (prstate->all_frozen && prstate->nfrozen > 0)
 		{
+			Assert(prstate->all_visible);
+
 			/*
 			 * Freezing would make the page all-frozen.  Have already emitted
 			 * an FPI or will do so anyway?
@@ -389,6 +390,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * critical section.
 		 */
 		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+		/*
+		 * Calculate what the snapshot conflict horizon should be for a record
+		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+		 * for conflicts when the whole page is eligible to become all-frozen
+		 * in the VM once we're done with it.  Otherwise we generate a
+		 * conservative cutoff by stepping back from OldestXmin.
+		 */
+		if (prstate->all_frozen)
+			*frz_conflict_horizon = prstate->visibility_cutoff_xid;
+		else
+		{
+			/* Avoids false conflicts when hot_standby_feedback in use */
+			*frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+			TransactionIdRetreat(*frz_conflict_horizon);
+		}
 	}
 	else if (prstate->nfrozen > 0)
 	{
@@ -433,10 +450,11 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * considered advantageous for overall system performance to do so now.  The
  * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
  * arguments are required when freezing.  When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen on exit,
- * to indicate if the VM bits can be set.  They are always set to false when
- * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
- * callers that also freeze need that information.
+ * passed, we also set presult->all_visible and presult->all_frozen after
+ * determining whether or not to opporunistically freeze, to indicate if the
+ * VM bits can be set.  They are always set to false when the
+ * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
+ * that also freeze need that information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -472,6 +490,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_hint_prune;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId frz_conflict_horizon = InvalidTransactionId;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = params->vistest;
@@ -541,10 +560,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * are tuples present that are not visible to everyone or if there are
 	 * dead tuples which are not yet removable.  However, dead tuples which
 	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * opportunistically freezing.  Because of that, we do not immediately
+	 * clear all_visible when we see LP_DEAD items.  We fix that after
+	 * scanning the line pointers, before we return the value to the caller,
+	 * so that the caller doesn't set the VM bit incorrectly.
 	 */
 	if (prstate.attempt_freeze)
 	{
@@ -779,8 +798,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  did_tuple_hint_fpi,
 									  do_prune,
 									  do_hint_prune,
-									  &prstate);
+									  &prstate,
+									  &frz_conflict_horizon);
 
+	/*
+	 * While scanning the line pointers, we did not clear
+	 * all_visible/all_frozen when encountering LP_DEAD items because we
+	 * wanted the decision whether or not to freeze the page to be unaffected
+	 * by the short-term presence of LP_DEAD items.  These LP_DEAD items are
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that we finished determining whether or not to freeze the page,
+	 * update all_visible and all_frozen so that they reflect the true state
+	 * of the page for setting PD_ALL_VISIBLE and VM bits.
+	 */
+	if (prstate.lpdead_items > 0)
+		prstate.all_visible = prstate.all_frozen = false;
+
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -839,27 +876,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
 			TransactionId conflict_xid;
 
-			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
-			 */
-			if (do_freeze)
-			{
-				if (prstate.all_visible && prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
-			}
-
 			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
 				conflict_xid = frz_conflict_horizon;
 			else
@@ -885,30 +903,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
+	presult->all_visible = prstate.all_visible;
+	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
 	/*
@@ -1288,8 +1284,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1415,7 +1414,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1437,7 +1436,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1450,7 +1449,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1469,7 +1468,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1487,7 +1486,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
@@ -1555,8 +1554,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b25050d6773..56a0286662b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2012,7 +2012,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * agreement with heap_page_is_all_visible() using an assertion.
 	 */
 #ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
 	if (presult.all_visible)
 	{
 		TransactionId debug_cutoff;
@@ -2065,6 +2064,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
 	Assert(!presult.all_visible || !(*has_lpdead_items));
+	Assert(!presult.all_frozen || presult.all_visible);
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
@@ -2170,11 +2170,10 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
+	 * it as all-frozen.
 	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	else if (all_visible_according_to_vm && presult.all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		uint8		old_vmbits;
 
-- 
2.43.0



  [text/x-patch] v17-0010-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (8.2K, 10-v17-0010-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From 078c1a636f208dee878fa4d78b6e05006513008a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v17 10/15] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.

The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 16 ++++++++--------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 17 ++++++++---------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 21b24f3992e..f1e137a387d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -233,7 +233,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -728,9 +728,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1153,11 +1153,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1615,7 +1615,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				/*
 				 * For now always use prstate->cutoffs for this test, because
 				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
+				 * could use GlobalVisXidVisibleToAll() instead, if a
 				 * non-freezing caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 }
 
 /*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
  *
  * It is crucial that this only gets called for xids from a source that
  * protects against xid wraparounds (e.g. from a table and thus protected by
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
  */
 bool
 GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v17-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (10.4K, 11-v17-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 783f1f53b90bc12ac025b68125e3cd85706c71fb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v17 11/15] Use GlobalVisState in vacuum to determine page
 level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum directly: GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
In the rare case that it moves backward, VACUUM falls back to OldestXmin
to ensure we don’t attempt to freeze a dead tuple that wasn’t yet
prunable according to the GlobalVisState.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++++
 src/backend/access/heap/pruneheap.c         | 37 ++++++++++-----------
 src/backend/access/heap/vacuumlazy.c        | 17 +++++-----
 src/include/access/heapam.h                 |  7 ++--
 4 files changed, 57 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f1e137a387d..671236ee23f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -710,11 +710,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -910,6 +911,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update the hint
@@ -1080,10 +1091,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		if (!heap_page_is_all_visible(params->relation, buffer,
-									  prstate.cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1612,19 +1622,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisXidVisibleToAll() instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1b20c96033e..3e9cf2f15a4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,7 +465,7 @@ static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2740,7 +2740,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3489,14 +3489,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3515,7 +3514,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3534,7 +3533,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3606,7 +3605,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		tuple.t_len = ItemIdGetLength(itemid);
 		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3625,7 +3624,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2de39ba0cd1..df0632aebc6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -276,8 +276,7 @@ typedef struct PruneFreezeParams
 	/*
 	 * cutoffs contains the freeze cutoffs, established by VACUUM at the
 	 * beginning of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE
-	 * option is set. cutoffs->OldestXmin is also used to determine if dead
-	 * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+	 * option is set.
 	 */
 	struct VacuumCutoffs *cutoffs;
 } PruneFreezeParams;
@@ -443,7 +442,7 @@ extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -455,6 +454,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v17-0012-Inline-TransactionIdFollows-Precedes.patch (5.0K, 12-v17-0012-Inline-TransactionIdFollows-Precedes.patch)
  download | inline diff:
From e412f9298b0735d1091f4769ace4d2d1a7e62312 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v17 12/15] Inline TransactionIdFollows/Precedes()

Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/transam/transam.c | 64 -------------------------
 src/include/access/transam.h         | 70 ++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
-	/*
-	 * If either ID is a permanent XID then we can just do unsigned
-	 * comparison.  If both are normal, do a modulo-2^32 comparison.
-	 */
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 < id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 <= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 > id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 >= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff >= 0);
-}
-
 
 /*
  * TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
 } TransamVariablesData;
 
 
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+	/*
+	 * If either ID is a permanent XID then we can just do unsigned
+	 * comparison.  If both are normal, do a modulo-2^32 comparison.
+	 */
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 < id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 <= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 > id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 >= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff >= 0);
+}
+
+
 /* ----------------
  *		extern declarations
  * ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
 extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
-- 
2.43.0



  [text/x-patch] v17-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch (27.9K, 13-v17-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 54fcba140e515eba0eb1f9d48e7d5875b92e7e39 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v17 13/15] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c              | 15 +++-
 src/backend/access/heap/heapam_handler.c      | 15 +++-
 src/backend/access/heap/pruneheap.c           | 71 +++++++++++++++----
 src/backend/access/index/indexam.c            | 46 ++++++++++++
 src/backend/access/table/tableam.c            | 39 ++++++++--
 src/backend/executor/execMain.c               |  4 ++
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  7 +-
 src/backend/executor/nodeIndexscan.c          | 18 +++--
 src/backend/executor/nodeSeqscan.c            | 24 +++++--
 src/include/access/genam.h                    | 11 +++
 src/include/access/heapam.h                   | 24 ++++++-
 src/include/access/relscan.h                  |  6 ++
 src/include/access/tableam.h                  | 30 +++++++-
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  3 +-
 16 files changed, 282 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14a2996b9ee..6181e355aaf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 671236ee23f..05e6b902069 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -186,7 +186,9 @@ static bool heap_page_will_set_vis(Relation relation,
 								   Buffer heap_buf,
 								   Buffer vmbuffer,
 								   bool blk_known_av,
-								   const PruneState *prstate,
+								   PruneReason reason,
+								   bool do_prune, bool do_freeze,
+								   PruneState *prstate,
 								   uint8 *vmflags,
 								   bool *do_set_pd_vis);
 
@@ -201,9 +203,13 @@ static bool heap_page_will_set_vis(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -269,12 +275,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeParams params;
 			PruneFreezeResult presult;
 
+			params.options = 0;
+			params.vmbuffer = InvalidBuffer;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			params.relation = relation;
 			params.buffer = buffer;
 			params.reason = PRUNE_ON_ACCESS;
 			params.vistest = vistest;
 			params.cutoffs = NULL;
-			params.vmbuffer = InvalidBuffer;
 			params.blk_known_av = false;
 
 			/*
@@ -454,6 +469,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * have examined this page’s VM bits (e.g., VACUUM in the previous
  * heap_vac_scan_next_block() call) and can pass that along.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with the desired
  * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
  * should be set on the heap page.
@@ -464,7 +482,9 @@ heap_page_will_set_vis(Relation relation,
 					   Buffer heap_buf,
 					   Buffer vmbuffer,
 					   bool blk_known_av,
-					   const PruneState *prstate,
+					   PruneReason reason,
+					   bool do_prune, bool do_freeze,
+					   PruneState *prstate,
 					   uint8 *vmflags,
 					   bool *do_set_pd_vis)
 {
@@ -480,6 +500,23 @@ heap_page_will_set_vis(Relation relation,
 		return false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+	{
+		prstate->all_visible = prstate->all_frozen = false;
+		return false;
+	}
+
 	if (prstate->all_visible && !PageIsAllVisible(heap_page))
 		*do_set_pd_vis = true;
 
@@ -503,6 +540,9 @@ heap_page_will_set_vis(Relation relation,
 	 * page-level bit is clear.  However, it's possible that in vacuum the bit
 	 * got cleared after heap_vac_scan_next_block() was called, so we must
 	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 *
+	 * XXX: This will never trigger for on-access pruning because it passes
+	 * blk_known_av as false. Should we remove that condition here?
 	 */
 	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
 			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -911,6 +951,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * Even if we don't prune anything, if we found a new value for the
+	 * pd_prune_xid field or the page was marked full, we will update the hint
+	 * bit.
+	 */
+	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+		PageIsFull(page);
+
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
@@ -921,14 +969,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
-	/*
-	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
-	 */
-	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
-		PageIsFull(page);
-
 	/*
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
@@ -972,6 +1012,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	do_set_vm = heap_page_will_set_vis(params->relation,
 									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   params->reason, do_prune, do_freeze,
 									   &prstate, &new_vmbits, &do_set_pd_vis);
 
 	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -2244,7 +2285,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 
 /*
  * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
- * record.
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
  */
 static TransactionId
 get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
@@ -2313,8 +2354,8 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
  * all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..8d582a8eafd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 5e41404937e..3e3a0f72a71 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 831c55ce787..15be318fd41 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..aa2112c8e04 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -178,6 +178,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -204,6 +209,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index df0632aebc6..59d8ce9ad42 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -415,7 +432,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..f250d4e7aec 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -882,6 +884,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -919,10 +940,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a36653c37f9..9c54fa06e4a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v17-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (26.4K, 14-v17-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 3f8b38eec729ebe3711cdb850bb768f14029795a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v17 09/15] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |  18 ++-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 110 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  28 +---
 src/include/access/visibilitymap.h       |  15 +--
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 56 insertions(+), 377 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7f354caec31..14a2996b9ee 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 RelationGetRelationName(relation));
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  RelationGetRelationName(relation));
 		}
 
 		/*
@@ -8798,50 +8798,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index f6624bc98d0..aeb97cc3cea 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -258,7 +258,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 
 		/* We don't have relation name during recovery, so use relfilenode */
 		relname = psprintf("%u", rlocator.relNumber);
-		old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
+		old_vmbits = visibilitymap_set(blkno, vmbuffer, vmflags, relname);
 
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
@@ -276,142 +276,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -789,8 +653,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -805,11 +669,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 		/* We don't have relation name during recovery, so use relfilenode */
 		relname = psprintf("%u", rlocator.relNumber);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 relname);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  relname);
 
 		PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		pfree(relname);
@@ -1390,9 +1254,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f70563008e1..21b24f3992e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1029,9 +1029,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		{
 			Assert(PageIsAllVisible(page));
 
-			old_vmbits = visibilitymap_set_vmbits(blockno,
-												  vmbuffer, new_vmbits,
-												  RelationGetRelationName(params->relation));
+			old_vmbits = visibilitymap_set(blockno,
+										   vmbuffer, new_vmbits,
+										   RelationGetRelationName(params->relation));
 			if (old_vmbits == new_vmbits)
 			{
 				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
@@ -2308,14 +2308,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ *   all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 941b989ec50..1b20c96033e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1900,11 +1900,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 RelationGetRelationName(vacrel->rel));
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  RelationGetRelationName(vacrel->rel));
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2784,9 +2784,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
 	{
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 RelationGetRelationName(vacrel->rel));
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  RelationGetRelationName(vacrel->rel));
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 2d43147ffb7..51d206e517d 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,107 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
 
 /*
  * Set visibility map (VM) flags in the block referenced by vmBuf.
@@ -343,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * heapRelname is used only for debugging.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const char *heapRelname)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const char *heapRelname)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 1cb44ca32d3..93505cb8c56 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -460,9 +453,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..e9e77bd678b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 3dcf37ba03f..859e5795457 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
@@ -31,15 +30,11 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const char *heapRelname);
+
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const char *heapRelname);
+
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 20f45232175..885f9acff39 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4275,7 +4275,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v17-0014-Set-pd_prune_xid-on-insert.patch (6.7K, 15-v17-0014-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 58ba42d63128085051847ac1c9d7a88702657c23 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v17 14/15] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.

ci-os-only:
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6181e355aaf..1704269715e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index aeb97cc3cea..dbbc4a16bd8 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -475,6 +475,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -624,9 +630,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



  [text/x-patch] v17-0015-Split-heap_page_prune_and_freeze-into-helpers.patch (17.8K, 16-v17-0015-Split-heap_page_prune_and_freeze-into-helpers.patch)
  download | inline diff:
From bd3b416719d53c0fa904d0aaae9540b1cce84ec2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 8 Oct 2025 18:45:45 -0400
Subject: [PATCH v17 15/15] Split heap_page_prune_and_freeze into helpers

---
 src/backend/access/heap/pruneheap.c | 316 +++++++++++++++-------------
 1 file changed, 170 insertions(+), 146 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 05e6b902069..51674733eaf 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -586,82 +586,20 @@ heap_page_will_set_vis(Relation relation,
 	return do_set_vm;
 }
 
-/*
- * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page. If the page's visibility status has changed, update it in
- * the VM.
- *
- * Caller must have pin and buffer cleanup lock on the page.  Note that we
- * don't update the FSM information for page on caller's behalf.  Caller might
- * also need to account for a reduction in the length of the line pointer
- * array following array truncation by us.
- *
- * params contains the input parameters used to control freezing and pruning
- * behavior. See the definition of PruneFreezeParams for more on what each
- * parameter does.
- *
- * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
- * it's required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now.  The
- * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
- * arguments are required when freezing.
- *
- * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
- * the page has changed, we will update the VM at the same time as pruning and
- * freezing the heap page. We will also update presult->old_vmbits and
- * presult->new_vmbits with the state of the VM before and after updating it
- * for the caller to use in bookkeeping.
- *
- * presult contains output parameters needed by callers, such as the number of
- * tuples removed and the offsets of dead items on the page after pruning.
- * heap_page_prune_and_freeze() is responsible for initializing it.  Required
- * by all callers.
- *
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
- * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set in params.  On entry, they contain the
- * oldest XID and multi-XID seen on the relation so far.  They will be updated
- * with oldest values present on the page after pruning.  After processing the
- * whole relation, VACUUM can use these values as the new
- * relfrozenxid/relminmxid for the relation.
- */
-void
-heap_page_prune_and_freeze(PruneFreezeParams *params,
-						   PruneFreezeResult *presult,
-						   OffsetNumber *off_loc,
-						   TransactionId *new_relfrozen_xid,
-						   MultiXactId *new_relmin_mxid)
+static void
+prune_freeze_setup(PruneFreezeParams *params, PruneState *prstate,
+				   TransactionId *new_relfrozen_xid,
+				   MultiXactId *new_relmin_mxid,
+				   PruneFreezeResult *presult)
 {
-	Buffer		buffer = params->buffer;
-	Buffer		vmbuffer = params->vmbuffer;
-	Page		page = BufferGetPage(buffer);
-	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	OffsetNumber offnum,
-				maxoff;
-	PruneState	prstate;
-	HeapTupleData tup;
-	bool		do_freeze;
-	bool		do_prune;
-	bool		do_hint_prune;
-	bool		do_set_vm;
-	bool		do_set_pd_vis;
-	bool		did_tuple_hint_fpi;
-	int64		fpi_before = pgWalUsage.wal_fpi;
-	TransactionId frz_conflict_horizon = InvalidTransactionId;
-	TransactionId conflict_xid = InvalidTransactionId;
-	uint8		new_vmbits = 0;
-	uint8		old_vmbits = 0;
-
 	/* Copy parameters to prstate */
-	prstate.vistest = params->vistest;
-	prstate.mark_unused_now =
+	prstate->vistest = params->vistest;
+	prstate->mark_unused_now =
 		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
-	prstate.attempt_update_vm =
+	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
 		(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
-	prstate.cutoffs = params->cutoffs;
+	prstate->cutoffs = params->cutoffs;
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -674,37 +612,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * prunable, we will save the lowest relevant XID in new_prune_xid. Also
 	 * initialize the rest of our working state.
 	 */
-	prstate.new_prune_xid = InvalidTransactionId;
-	prstate.latest_xid_removed = InvalidTransactionId;
-	prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
-	prstate.nroot_items = 0;
-	prstate.nheaponly_items = 0;
+	prstate->new_prune_xid = InvalidTransactionId;
+	prstate->latest_xid_removed = InvalidTransactionId;
+	prstate->nredirected = prstate->ndead = prstate->nunused = prstate->nfrozen = 0;
+	prstate->nroot_items = 0;
+	prstate->nheaponly_items = 0;
 
 	/* initialize page freezing working state */
-	prstate.pagefrz.freeze_required = false;
-	if (prstate.attempt_freeze)
+	prstate->pagefrz.freeze_required = false;
+	if (prstate->attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
-		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
-		prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
-		prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
-		prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+		prstate->pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
+		prstate->pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
+		prstate->pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
+		prstate->pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
 	}
 	else
 	{
 		Assert(new_relfrozen_xid == NULL && new_relmin_mxid == NULL);
-		prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
-		prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
-		prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
-		prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+		prstate->pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+		prstate->pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+		prstate->pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+		prstate->pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
 	}
 
-	prstate.ndeleted = 0;
-	prstate.live_tuples = 0;
-	prstate.recently_dead_tuples = 0;
-	prstate.hastup = false;
-	prstate.lpdead_items = 0;
-	prstate.deadoffsets = presult->deadoffsets;
+	prstate->ndeleted = 0;
+	prstate->live_tuples = 0;
+	prstate->recently_dead_tuples = 0;
+	prstate->hastup = false;
+	prstate->lpdead_items = 0;
+	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
 	 * Track whether the page could be marked all-visible and/or all-frozen.
@@ -732,20 +670,20 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * bookkeeping. In this case, initializing all_visible to false allows
 	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
-	if (prstate.attempt_freeze)
+	if (prstate->attempt_freeze)
 	{
-		prstate.all_visible = true;
-		prstate.all_frozen = true;
+		prstate->all_visible = true;
+		prstate->all_frozen = true;
 	}
-	else if (prstate.attempt_update_vm)
+	else if (prstate->attempt_update_vm)
 	{
-		prstate.all_visible = true;
-		prstate.all_frozen = false;
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
 	}
 	else
 	{
-		prstate.all_visible = false;
-		prstate.all_frozen = false;
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
 	}
 
 	/*
@@ -757,10 +695,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * used to calculate the snapshot conflict horizon when updating the VM
 	 * and/or freezing all the tuples on the page.
 	 */
-	prstate.visibility_cutoff_xid = InvalidTransactionId;
+	prstate->visibility_cutoff_xid = InvalidTransactionId;
+}
 
-	maxoff = PageGetMaxOffsetNumber(page);
-	tup.t_tableOid = RelationGetRelid(params->relation);
+static void
+prune_freeze_plan(PruneState *prstate, BlockNumber blockno, Buffer buffer, Page page,
+				  OffsetNumber maxoff, OffsetNumber *off_loc, HeapTuple tup)
+{
+	OffsetNumber offnum;
 
 	/*
 	 * Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -795,13 +737,13 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		*off_loc = offnum;
 
-		prstate.processed[offnum] = false;
-		prstate.htsv[offnum] = -1;
+		prstate->processed[offnum] = false;
+		prstate->htsv[offnum] = -1;
 
 		/* Nothing to do if slot doesn't contain a tuple */
 		if (!ItemIdIsUsed(itemid))
 		{
-			heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
+			heap_prune_record_unchanged_lp_unused(page, prstate, offnum);
 			continue;
 		}
 
@@ -811,17 +753,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * If the caller set mark_unused_now true, we can set dead line
 			 * pointers LP_UNUSED now.
 			 */
-			if (unlikely(prstate.mark_unused_now))
-				heap_prune_record_unused(&prstate, offnum, false);
+			if (unlikely(prstate->mark_unused_now))
+				heap_prune_record_unused(prstate, offnum, false);
 			else
-				heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
+				heap_prune_record_unchanged_lp_dead(page, prstate, offnum);
 			continue;
 		}
 
 		if (ItemIdIsRedirected(itemid))
 		{
 			/* This is the start of a HOT chain */
-			prstate.root_items[prstate.nroot_items++] = offnum;
+			prstate->root_items[prstate->nroot_items++] = offnum;
 			continue;
 		}
 
@@ -831,25 +773,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * Get the tuple's visibility status and queue it up for processing.
 		 */
 		htup = (HeapTupleHeader) PageGetItem(page, itemid);
-		tup.t_data = htup;
-		tup.t_len = ItemIdGetLength(itemid);
-		ItemPointerSet(&tup.t_self, blockno, offnum);
+		tup->t_data = htup;
+		tup->t_len = ItemIdGetLength(itemid);
+		ItemPointerSet(&tup->t_self, blockno, offnum);
 
-		prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
-														   buffer);
+		prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, tup,
+															buffer);
 
 		if (!HeapTupleHeaderIsHeapOnly(htup))
-			prstate.root_items[prstate.nroot_items++] = offnum;
+			prstate->root_items[prstate->nroot_items++] = offnum;
 		else
-			prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
+			prstate->heaponly_items[prstate->nheaponly_items++] = offnum;
 	}
 
-	/*
-	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
-	 * an FPI to be emitted.
-	 */
-	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
-
 	/*
 	 * Process HOT chains.
 	 *
@@ -861,30 +797,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * the page instead of using the root_items array, also did it in
 	 * ascending offset number order.)
 	 */
-	for (int i = prstate.nroot_items - 1; i >= 0; i--)
+	for (int i = prstate->nroot_items - 1; i >= 0; i--)
 	{
-		offnum = prstate.root_items[i];
+		offnum = prstate->root_items[i];
 
 		/* Ignore items already processed as part of an earlier chain */
-		if (prstate.processed[offnum])
+		if (prstate->processed[offnum])
 			continue;
 
 		/* see preceding loop */
 		*off_loc = offnum;
 
 		/* Process this item or chain of items */
-		heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
+		heap_prune_chain(page, blockno, maxoff, offnum, prstate);
 	}
 
 	/*
 	 * Process any heap-only tuples that were not already processed as part of
 	 * a HOT chain.
 	 */
-	for (int i = prstate.nheaponly_items - 1; i >= 0; i--)
+	for (int i = prstate->nheaponly_items - 1; i >= 0; i--)
 	{
-		offnum = prstate.heaponly_items[i];
+		offnum = prstate->heaponly_items[i];
 
-		if (prstate.processed[offnum])
+		if (prstate->processed[offnum])
 			continue;
 
 		/* see preceding loop */
@@ -903,7 +839,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * return true for an XMIN_INVALID tuple, so this code will work even
 		 * when there were sequential updates within the aborted transaction.)
 		 */
-		if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
+		if (prstate->htsv[offnum] == HEAPTUPLE_DEAD)
 		{
 			ItemId		itemid = PageGetItemId(page, offnum);
 			HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -911,8 +847,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
 			{
 				HeapTupleHeaderAdvanceConflictHorizon(htup,
-													  &prstate.latest_xid_removed);
-				heap_prune_record_unused(&prstate, offnum, true);
+													  &prstate->latest_xid_removed);
+				heap_prune_record_unused(prstate, offnum, true);
 			}
 			else
 			{
@@ -929,7 +865,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			}
 		}
 		else
-			heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
+			heap_prune_record_unchanged_lp_normal(page, prstate, offnum);
 	}
 
 	/* We should now have processed every tuple exactly once  */
@@ -940,12 +876,110 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	{
 		*off_loc = offnum;
 
-		Assert(prstate.processed[offnum]);
+		Assert(prstate->processed[offnum]);
 	}
 #endif
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate->all_visible &&
+		TransactionIdIsNormal(prstate->visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate->vistest, prstate->visibility_cutoff_xid))
+		prstate->all_visible = prstate->all_frozen = false;
+
 	/* Clear the offset information once we have processed the given page. */
 	*off_loc = InvalidOffsetNumber;
+}
+
+/*
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
+ *
+ * Caller must have pin and buffer cleanup lock on the page.  Note that we
+ * don't update the FSM information for page on caller's behalf.  Caller might
+ * also need to account for a reduction in the length of the line pointer
+ * array following array truncation by us.
+ *
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
+ * it's required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now.  The
+ * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
+ * arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
+ *
+ * presult contains output parameters needed by callers, such as the number of
+ * tuples removed and the offsets of dead items on the page after pruning.
+ * heap_page_prune_and_freeze() is responsible for initializing it.  Required
+ * by all callers.
+ *
+ * off_loc is the offset location required by the caller to use in error
+ * callback.
+ *
+ * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
+ * HEAP_PRUNE_FREEZE option is set in params.  On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far.  They will be updated
+ * with oldest values present on the page after pruning.  After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
+ */
+void
+heap_page_prune_and_freeze(PruneFreezeParams *params,
+						   PruneFreezeResult *presult,
+						   OffsetNumber *off_loc,
+						   TransactionId *new_relfrozen_xid,
+						   MultiXactId *new_relmin_mxid)
+{
+	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
+	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	OffsetNumber maxoff;
+	PruneState	prstate;
+	HeapTupleData tup;
+	bool		do_freeze;
+	bool		do_prune;
+	bool		do_hint_prune;
+	bool		do_set_vm;
+	bool		do_set_pd_vis;
+	bool		did_tuple_hint_fpi;
+	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId frz_conflict_horizon = InvalidTransactionId;
+	TransactionId conflict_xid = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
+
+	maxoff = PageGetMaxOffsetNumber(page);
+	tup.t_tableOid = RelationGetRelid(params->relation);
+
+	/* Initialize needed state in prstate */
+	prune_freeze_setup(params, &prstate, new_relfrozen_xid, new_relmin_mxid, presult);
+
+	/*
+	 * Examine all line pointers and tuple visibility information to determine
+	 * which line pointers should change state and which tuples may be frozen.
+	 * Prepare queue of state changes to later be executed in a critical
+	 * section.
+	 */
+	prune_freeze_plan(&prstate, blockno, buffer, page, maxoff, off_loc, &tup);
+
+	/*
+	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+	 * an FPI to be emitted.
+	 */
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	do_prune = prstate.nredirected > 0 ||
 		prstate.ndead > 0 ||
@@ -959,16 +993,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
-	/*
-	 * After processing all the live tuples on the page, if the newest xmin
-	 * amongst them is not visible to everyone, the page cannot be
-	 * all-visible.
-	 */
-	if (prstate.all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
-		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
-		prstate.all_visible = prstate.all_frozen = false;
-
 	/*
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-10-09 18:18  Andres Freund <[email protected]>
  parent: Melanie Plageman <[email protected]>
  2 siblings, 1 reply; 143+ messages in thread

From: Andres Freund @ 2025-10-09 18:18 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Robert Haas <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hi,

On 2025-10-08 18:54:25 -0400, Melanie Plageman wrote:
> +uint8
> +visibilitymap_set_vmbits(BlockNumber heapBlk,
> +						 Buffer vmBuf, uint8 flags,
> +						 const char *heapRelname)
> +{
> +	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
> +	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
> +	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
> +	Page		page;
> +	uint8	   *map;
> +	uint8		status;
> +
> +#ifdef TRACE_VISIBILITYMAP
> +	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
> +		 flags, heapRelname, heapBlk);
> +#endif

I like it doesn't take a Relation anymore, but I'd just pass the smgrrelation
instead, then you don't need to allocate the string in the caller, when it's
approximately never used.

Otherwise this looks pretty close to me.



> @@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
>  	}
>  
>  	/*
> -	 * If we have a full-page image, restore it and we're done.
> +	 * If we have a full-page image of the heap block, restore it and we're
> +	 * done with the heap block.
>  	 */
> -	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
> -										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
> -										   &buffer);
> -	if (action == BLK_NEEDS_REDO)
> +	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
> +									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
> +									  &buffer) == BLK_NEEDS_REDO)
>  	{
>  		Page		page = BufferGetPage(buffer);
>  		OffsetNumber *redirected;

Why move it around this way?


> @@ -138,36 +157,104 @@ heap_xlog_prune_freeze(XLogReaderState *record)
>  		/* There should be no more data */
>  		Assert((char *) frz_offsets == dataptr + datalen);
>  
> +		if ((vmflags & VISIBILITYMAP_VALID_BITS))
> +			PageSetAllVisible(page);
> +
> +		MarkBufferDirty(buffer);
> +
> +		/*
> +		 * Always emit a WAL record when setting PD_ALL_VISIBLE but only emit
> +		 * an FPI if checksums/wal_log_hints are enabled.

This comment reads as-if we're WAL logging here, but this is a
Wendy's^Wrecovery.

> Advance the page LSN
> +		 * only if the record could include an FPI, since recovery skips
> +		 * records <= the stamped LSN. Otherwise it might skip an earlier FPI
> +		 * needed to repair a torn page.
> +		 */

This is confusing, should probably just reference the stuff we did in the
!recovery case.


> +		if (do_prune || nplans > 0 ||
> +			((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
> +			PageSetLSN(page, lsn);
> +
>  		/*
>  		 * Note: we don't worry about updating the page's prunability hints.
>  		 * At worst this will cause an extra prune cycle to occur soon.
>  		 */

Not your fault, but that seems odd? Why aren't we just doing the right thing?

>  	/*
> -	 * If we released any space or line pointers, update the free space map.
> +	 * If we released any space or line pointers or set PD_ALL_VISIBLE or the
> +	 * VM, update the freespace map.

I'd replace the first or with a , ;)


> +	 * Even when no actual space is freed (e.g., when only marking the page
> +	 * all-visible or frozen), we still update the FSM. Because the FSM is
> +	 * unlogged and maintained heuristically, it often becomes stale on
> +	 * standbys. If such a standby is later promoted and runs VACUUM, it will
> +	 * skip recalculating free space for pages that were marked all-visible
> +	 * (or all-frozen, depending on the mode). FreeSpaceMapVacuum can then
> +	 * propagate overly optimistic free space values upward, causing future
> +	 * insertions to select pages that turn out to be unusable. In bulk, this
> +	 * can lead to long stalls.
> +	 *
> +	 * To prevent this, always refresh the FSM’s view when a page becomes
> +	 * all-visible or all-frozen.

I'd s/refresh/update/, because refresh sounds more like rereading the current
state of the FSM, rather than changing the FSM.


> +		/* We don't have relation name during recovery, so use relfilenode */
> +		relname = psprintf("%u", rlocator.relNumber);
> +		old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
>  
> -			XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
> +		/* Only set VM page LSN if we modified the page */
> +		if (old_vmbits != vmflags)
> +		{
> +			Assert(BufferIsDirty(vmbuffer));
> +			PageSetLSN(BufferGetPage(vmbuffer), lsn);
>  		}
> -		else
> -			UnlockReleaseBuffer(buffer);
> +		pfree(relname);

Hm. When can we actually enter the old_vmbits == vmflags case?  It might also
be fine to just say that we don't expect it to change but are mirroring the
code in visibilitymap_set().


I wonder if the VM specific redo portion should be in a common helper? Might
not be enough code to worry though...


> @@ -2070,8 +2079,24 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
>  	xlhp_prune_items dead_items;
>  	xlhp_prune_items unused_items;
>  	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
> +	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
> +	bool		do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
>  
>  	xlrec.flags = 0;
> +	regbuf_flags = REGBUF_STANDARD;
> +
> +	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
> +
> +	/*
> +	 * We can avoid an FPI if the only modification we are making to the heap
> +	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.

Maybe s/an FPI/an FPI for the heap pae/?


> +	 * Note that if we explicitly skip an FPI, we must not set the heap page
> +	 * LSN later.
> +	 */
> +	if (!do_prune &&
> +		nfrozen == 0 &&
> +		(!do_set_vm || !XLogHintBitIsNeeded()))
> +		regbuf_flags |= REGBUF_NO_IMAGE;

>  	/*
>  	 * Prepare data for the buffer.  The arrays are not actually in the
> @@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
>  	 * page image, the arrays can be omitted.
>  	 */
>  	XLogBeginInsert();
> -	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
> +	XLogRegisterBuffer(0, buffer, regbuf_flags);
> +
> +	if (do_set_vm)
> +		XLogRegisterBuffer(1, vmbuffer, 0);

Seems a bit confusing that it's named regbuf_flags but isn't used all the
XLogRegisterBuffer calls. Maybe make the name more specific
(regbuf_flags_heap?)...

>  	}
>  	recptr = XLogInsert(RM_HEAP2_ID, info);
>  
> -	PageSetLSN(BufferGetPage(buffer), recptr);
> +	if (do_set_vm)
> +	{
> +		Assert(BufferIsDirty(vmbuffer));
> +		PageSetLSN(BufferGetPage(vmbuffer), recptr);
> +	}

> +	/*
> +	 * We must bump the page LSN if pruning or freezing. If we are only
> +	 * updating PD_ALL_VISIBLE, though, we can skip doing this unless
> +	 * wal_log_hints/checksums are enabled. Torn pages are possible if we
> +	 * update PD_ALL_VISIBLE without bumping the LSN, but this is deemed okay
> +	 * for page hint updates.
> +	 */

Arguably it's not a torn page if we only modified something as narrow as a
hint bit, and are redoing that change after recovery. But that's extremely
nitpicky.

I wonder if the comment explaining this should be put into one place and
reference it from all the different places.

> @@ -2860,6 +2867,29 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
>  							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
>  							 InvalidOffsetNumber);
>  
> +	/*
> +	 * Before marking dead items unused, check whether the page will become
> +	 * all-visible once that change is applied.

So the function is named _would_ but here you say will :)


> This lets us reap the tuples
> +	 * and mark the page all-visible within the same critical section,
> +	 * enabling both changes to be emitted in a single WAL record. Since the
> +	 * visibility checks may perform I/O and allocate memory, they must be
> +	 * done outside the critical section.
> +	 */
> +	if (heap_page_would_be_all_visible(vacrel, buffer,
> +									   deadoffsets, num_offsets,
> +									   &all_frozen, &visibility_cutoff_xid))
> +	{
> +		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
> +		if (all_frozen)
> +		{
> +			vmflags |= VISIBILITYMAP_ALL_FROZEN;
> +			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
> +		}
> +
> +		/* Take the lock on the vmbuffer before entering a critical section */
> +		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);

It sure would be nice if we had documented the lock order between the heap
page and the corresponding VM page anywhere.  This is just doing what we did
before, so it's not this patch's fault, but I did get worried about it for a
moment.


> +/*
> + * Check whether the heap page in buf is all-visible except for the dead
> + * tuples referenced in the deadoffsets array.
> + *
> + * The visibility checks may perform IO and allocate memory so they must not
> + * be done in a critical section. This function is used by vacuum to determine
> + * if the page will be all-visible once it reaps known dead tuples. That way
> + * it can do both in the same critical section and emit a single WAL record.
> + *
> + * Returns true if the page is all-visible other than the provided
> + * deadoffsets and false otherwise.
> + *
> + * Output parameters:
> + *
> + *  - *all_frozen: true if every tuple on the page is frozen
> + *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
> + * Callers looking to verify that the page is already all-visible can call
> + * heap_page_is_all_visible().
> + *
> + * This logic is closely related to heap_prune_record_unchanged_lp_normal().
> + * If you modify this function, ensure consistency with that code. An
> + * assertion cross-checks that both remain in agreement. Do not introduce new
> + * side-effects.
> + */
> +static bool
> +heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
> +							   OffsetNumber *deadoffsets,
> +							   int ndeadoffsets,
> +							   bool *all_frozen,
> +							   TransactionId *visibility_cutoff_xid)
> +{
>  	Page		page = BufferGetPage(buf);
>  	BlockNumber blockno = BufferGetBlockNumber(buf);
>  	OffsetNumber offnum,
>  				maxoff;
>  	bool		all_visible = true;
> +	int			matched_dead_count = 0;
>  
>  	*visibility_cutoff_xid = InvalidTransactionId;
>  	*all_frozen = true;
>  
> +	Assert(ndeadoffsets == 0 || deadoffsets);
> +
> +#ifdef USE_ASSERT_CHECKING
> +	/* Confirm input deadoffsets[] is strictly sorted */
> +	if (ndeadoffsets > 1)
> +	{
> +		for (int i = 1; i < ndeadoffsets; i++)
> +			Assert(deadoffsets[i - 1] < deadoffsets[i]);
> +	}
> +#endif
> +
>  	maxoff = PageGetMaxOffsetNumber(page);
>  	for (offnum = FirstOffsetNumber;
>  		 offnum <= maxoff && all_visible;
> @@ -3649,9 +3712,15 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
>  		 */
>  		if (ItemIdIsDead(itemid))
>  		{
> -			all_visible = false;
> -			*all_frozen = false;
> -			break;
> +			if (!deadoffsets ||
> +				matched_dead_count >= ndeadoffsets ||
> +				deadoffsets[matched_dead_count] != offnum)
> +			{
> +				*all_frozen = all_visible = false;
> +				break;
> +			}
> +			matched_dead_count++;
> +			continue;
>  		}
>  
>  		Assert(ItemIdIsNormal(itemid));

Hm, what about an assert checking that matched_dead_count == ndeadoffsets at
the end?


> From 6b5fc27f0d80bab1df86a2e6fb51b64fd20c3cbb Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Mon, 15 Sep 2025 12:06:19 -0400
> Subject: [PATCH v17 03/15] Assorted trivial heap_page_prune_and_freeze cleanup

Seems like a good idea, but I'm too lazy to go through this in detail.


> From c69a5219a9b792f3c9f6dc730b8810a88d088ae6 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Tue, 16 Sep 2025 14:22:10 -0400
> Subject: [PATCH v17 04/15] Add helper for freeze determination to
>  heap_page_prune_and_freeze
> 
> After scanning through the line pointers on the heap page during
> vacuum's first phase, we use several statuses and information we
> collected to determine whether or not we will use the freeze plans we
> assembled.
> 
> Do this in a helper for better readability.


> @@ -663,85 +775,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>  	 * Decide if we want to go ahead with freezing according to the freeze
>  	 * plans we prepared, or not.
>  	 */
> -	do_freeze = false;
> - ...
> +	do_freeze = heap_page_will_freeze(params->relation, buffer,
> +									  did_tuple_hint_fpi,
> +									  do_prune,
> +									  do_hint_prune,
> +									  &prstate);
>  

Assuming this is just moving the code, I like this quite bit.


> From d4a4be3eed25853fc1ea84ebc2cbe0226afd823a Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Mon, 15 Sep 2025 16:25:44 -0400
> Subject: [PATCH v17 05/15] Update PruneState.all_[visible|frozen] earlier in
>  pruning
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> In the prune/freeze path, we currently delay clearing all_visible and
> all_frozen when dead items are present. This allows opportunistic
> freezing if the page would otherwise be fully frozen, since those dead
> items are later removed in vacuum’s third phase.
> 
> However, if no freezing will be attempted, there’s no need to delay.
> Clearing the flags promptly avoids extra bookkeeping in
> heap_prune_unchanged_lp_normal(). At present this has no runtime effect
> because all callers that consider setting the VM also attempt freezing,
> but future callers (e.g. on-access pruning) may want to set the VM
> without preparing freeze plans.

s/heap_prune_unchanged_lp_normal/heap_prune_record_unchanged_lp_normal/

I think this should make it clearer that this is about reducing overhead for
future use of this code in on-access-pruning.


> We also used to defer clearing all_visible and all_frozen until after
> computing the visibility cutoff XID. By determining the cutoff earlier,
> we can update these flags immediately after deciding whether to
> opportunistically freeze. This is necessary if we want to set the VM in
> the same WAL record that prunes and freezes tuples on the page.

I think this last sentence needs to be first. This is the only really
important thing in this patch, afaict.



> From 86193a71d2ff9649b5b1c1e6963bd610285ad369 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Fri, 3 Oct 2025 15:57:02 -0400
> Subject: [PATCH v17 06/15] Make heap_page_is_all_visible independent of
>  LVRelState
> 
> Future commits will use this function inside of pruneheap.c where we do
> not have access to the LVRelState. We only need a few parameters from
> the LVRelState, so just pass those in explicitly.
> 
> Author: Melanie Plageman <[email protected]>
> Reviewed-by: Kirill Reshke <[email protected]>
> Reviewed-by: Robert Haas <[email protected]>
> Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com

Makes sense. I don't think we need to wait for other stuff to be committed to
commit this.


> From dde0dfc578137f7c93f9a0e34af38dcdb841b080 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 8 Oct 2025 15:39:01 -0400
> Subject: [PATCH v17 07/15] Eliminate XLOG_HEAP2_VISIBLE from vacuum
>  prune/freeze
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit

Seems very mildly odd that 0002 references phase III in the subject, but this
doesn't...

(I'm just very lightly skimming from this point on)


> During vacuum's first and third phases, we examine tuples' visibility
> to determine if we can set the page all-visible in the visibility map.
> 
> Previously, this check compared tuple xmins against a single XID chosen at
> the start of vacuum (OldestXmin). We now use GlobalVisState, which also
> enables future work to set the VM during on-access pruning, since ordinary
> queries have access to GlobalVisState but not OldestXmin.
> 
> This also benefits vacuum directly: GlobalVisState may advance
> during a vacuum, allowing more pages to become considered all-visible.
> In the rare case that it moves backward, VACUUM falls back to OldestXmin
> to ensure we don’t attempt to freeze a dead tuple that wasn’t yet
> prunable according to the GlobalVisState.

It could, but it currently won't advance in vacuum, right?


> From e412f9298b0735d1091f4769ace4d2d1a7e62312 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Tue, 29 Jul 2025 09:57:13 -0400
> Subject: [PATCH v17 12/15] Inline TransactionIdFollows/Precedes()
> 
> Calling these from on-access pruning code had noticeable overhead in a
> profile. There does not seem to be a reason not to inline them.

Makes sense, just commit this ahead of the more complicated rest.



> From 54fcba140e515eba0eb1f9d48e7d5875b92e7e39 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Tue, 29 Jul 2025 14:34:30 -0400
> Subject: [PATCH v17 13/15] Allow on-access pruning to set pages all-visible

Sorry, will have to look at this another time...

Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-10-14 03:31  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  2 siblings, 1 reply; 143+ messages in thread

From: Kirill Reshke @ 2025-10-14 03:31 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Robert Haas <[email protected]>; Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Thu, 9 Oct 2025 at 03:54, Melanie Plageman <[email protected]> wrote:
>
> On Mon, Oct 6, 2025 at 6:40 PM Melanie Plageman
> <[email protected]> wrote:
> >
> > In attached v16, I’ve reverted to removing XLOG_HEAP2_VISIBLE
> > entirely, rather than first removing each caller's heap page from the
> > VM WAL chain. I reordered changes and squashed several refactoring
> > patches to improve patch-by-patch readability. This should make the
> > set read differently from earlier versions that removed
> > XLOG_HEAP2_VISIBLE and had more step-by-step mechanical refactoring.
> >
> > I think if we plan to go all the way with removing XLOG_HEAP2_VISIBLE,
> > having intermediate patches that just set PD_ALL_VISIBLE when making
> > other heap pages are more confusing than helpful. Also, I think having
> > separate flags for setting PD_ALL_VISIBLE in the WAL record
> > over-complicated the code.
>
> I decided to reorder the patches to remove XLOG_HEAP2_VISIBLE from
> vacuum phase III before removing it from vacuum phase I because
> removing it from phase III doesn't require preliminary refactoring
> patches. I've done that in the attached v17.
>
> I've also added an experimental patch on the end that refactors large
> chunks of heap_page_prune_and_freeze() into helpers. I got some
> feedback off-list that heap_page_prune_and_freeze() is too unwieldy
> now. I'm not sure how I feel about them yet, so I haven't documented
> them or moved them up in the patch set to before changes to
> heap_page_prune_and_freeze().
>
> 0001: Eliminate XLOG_HEAP2_VISIBLE from COPY FREEZE
> 0002: Eliminate XLOG_HEAP2_VISIBLE from phase III of vacuum
> 0003 - 0006: cleanup and refactoring to prepare for 0007
> 0007: Eliminate XLOG_HEAP2_VISIBLE from vacuum prune/freeze
> 0008 - 0009: Remove XLOG_HEAP2_VISIBLE
> 0010 - 0012: refactoring to prepare for 0013
> 0013: Set VM on-access
> 0014: Set pd_prune_xid on insert
> 0015: Experimental refactoring of heap_page_prune_and_freeze into helpers
>
> - Melanie

Hi! Should we also bump XLOG_PAGE_MAGIC after d96f87332 & add323da40a
or do we wait for full set to be committed?
-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-10-14 03:42  Michael Paquier <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Michael Paquier @ 2025-10-14 03:42 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Melanie Plageman <[email protected]>; Robert Haas <[email protected]>; Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Tue, Oct 14, 2025 at 08:31:04AM +0500, Kirill Reshke wrote:
> Hi! Should we also bump XLOG_PAGE_MAGIC after d96f87332 & add323da40a
> or do we wait for full set to be committed?

I may be missing something, of course, but d96f87332 has not changed
the WAL format, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN
existing before that.  The change in xl_heap_prune as done in
add323da40a6 should have bumped the format number.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-10-14 14:16  Melanie Plageman <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-10-14 14:16 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Oct 13, 2025 at 11:43 PM Michael Paquier <[email protected]> wrote:
>
> On Tue, Oct 14, 2025 at 08:31:04AM +0500, Kirill Reshke wrote:
> > Hi! Should we also bump XLOG_PAGE_MAGIC after d96f87332 & add323da40a
> > or do we wait for full set to be committed?
>
> I may be missing something, of course, but d96f87332 has not changed
> the WAL format, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN
> existing before that.  The change in xl_heap_prune as done in
> add323da40a6 should have bumped the format number.

Oops! Thanks for reporting.

I messed up and forgot to do this. And, if I'm not misunderstanding
the criteria, I did the same thing at the beginning of September with
4b5f206de2bb. I've committed the bump. Hopefully I learned my lesson.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-10-14 23:26  Melanie Plageman <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-10-14 23:26 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Robert Haas <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Thanks so much for the review! I've addressed all your feedback except
what is commented on inline below.
I've gone ahead and committed the preliminary patches that you thought
were ready to commit.

Attached v18 is what remains.

0001 - 0003: refactoring
0004 - 0006: finish eliminating XLOG_HEAP2_VISIBLE
0007 - 0009: refactoring
0010: Set VM on-access
0011: Set prune xid on insert
0012: Some refactoring for discussion

For 0001, I got feedback heap_page_prune_and_freeze() has too many
arguments, so I tried to address that. I'm interested to know if folks
like this more.

0011 still needs a bit of investigation to understand fully if
anything else in the index-killtuples test needs to be changed to make
sure we have the same coverage.

0012 is sort of WIP. I got feedback heap_page_prune_and_freeze() was
too long and should be split up into helpers. I want to know if this
split makes sense. I can pull it down the patch stack if so.

Only 0001 and 0012 are optional amongst the refactoring patches. The
others are required to make on-access VM-setting possible or viable.

On Thu, Oct 9, 2025 at 2:18 PM Andres Freund <[email protected]> wrote:
>
> > @@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
> >       }
> > -     action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
> > -                                                                                (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
> > -                                                                                &buffer);
> > -     if (action == BLK_NEEDS_REDO)
> > +     if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
> > +                                                                       (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
> > +                                                                       &buffer) == BLK_NEEDS_REDO)
> >       {
> >               Page            page = BufferGetPage(buffer);
> >               OffsetNumber *redirected;
>
> Why move it around this way?

Because there will be an action for the visibility map
XLogReadBufferForRedoExtended(). I could have renamed it heap_action,
but it is being used only in one place, so I preferred to just cut it
to avoid any confusion.

> > Advance the page LSN
> > +              * only if the record could include an FPI, since recovery skips
> > +              * records <= the stamped LSN. Otherwise it might skip an earlier FPI
> > +              * needed to repair a torn page.
> > +              */
>
> This is confusing, should probably just reference the stuff we did in the
> !recovery case.

I fixed this and addressed all your feedback related to this before committing.

> > +             if (do_prune || nplans > 0 ||
> > +                     ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
> > +                     PageSetLSN(page, lsn);
> > +
> >               /*
> >                * Note: we don't worry about updating the page's prunability hints.
> >                * At worst this will cause an extra prune cycle to occur soon.
> >                */
>
> Not your fault, but that seems odd? Why aren't we just doing the right thing?

The comment dates back to 6f10eb2. I imagine no one ever bothered to
fuss with extracting the XID. You could change
heap_page_prune_execute() to return the right value -- though that's a
bit ugly since it is used in normal operation as well as recovery.

> I wonder if the VM specific redo portion should be in a common helper? Might
> not be enough code to worry though...

I think it might be more code as a helper at this point.

> > @@ -2860,6 +2867,29 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
> >                                                        VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
> >                                                        InvalidOffsetNumber);
> >
> > +     /*
> > +      * Before marking dead items unused, check whether the page will become
> > +      * all-visible once that change is applied.
>
> So the function is named _would_ but here you say will :)

I thought about it more and still feel that this function name should
contain "would". From vacuum's perspective it is "will" -- because it
knows it will remove those dead items, but from the function's
perspective it is hypothetical. I changed the comment though.

> > +     if (heap_page_would_be_all_visible(vacrel, buffer,
> > +                                                                        deadoffsets, num_offsets,
> > +                                                                        &all_frozen, &visibility_cutoff_xid))
> > +     {
> > +             vmflags |= VISIBILITYMAP_ALL_VISIBLE;
> > +             if (all_frozen)
> > +             {
> > +                     vmflags |= VISIBILITYMAP_ALL_FROZEN;
> > +                     Assert(!TransactionIdIsValid(visibility_cutoff_xid));
> > +             }
> > +
> > +             /* Take the lock on the vmbuffer before entering a critical section */
> > +             LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
>
> It sure would be nice if we had documented the lock order between the heap
> page and the corresponding VM page anywhere.  This is just doing what we did
> before, so it's not this patch's fault, but I did get worried about it for a
> moment.

Well, the comment above the visibilitymap_set* functions says what
expectations they have for the heap page being locked.

> > +static bool
> > +heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
> > +                                                        OffsetNumber *deadoffsets,
> > +                                                        int ndeadoffsets,
> > +                                                        bool *all_frozen,
> > +                                                        TransactionId *visibility_cutoff_xid)
> > +{
> >       Page            page = BufferGetPage(buf);

> Hm, what about an assert checking that matched_dead_count == ndeadoffsets at
> the end?

I was going to put an Assert(ndeadoffsets <= matched_dead_count), but
then I started wondering if there is a way we could end up with fewer
dead items than we collected during phase I.

I had thought about if we dropped an index and then did on-access
pruning -- but we don't allow setting LP_DEAD items LP_UNUSED in
on-access pruning. So, maybe this is safe... I can do a follow-on
commit to add the assert. But I'm just not 100% sure I've thought of
all the cases where we might end up with fewer dead items.

> > During vacuum's first and third phases, we examine tuples' visibility
> > to determine if we can set the page all-visible in the visibility map.
> >
> > Previously, this check compared tuple xmins against a single XID chosen at
> > the start of vacuum (OldestXmin). We now use GlobalVisState, which also
> > enables future work to set the VM during on-access pruning, since ordinary
> > queries have access to GlobalVisState but not OldestXmin.
> >
> > This also benefits vacuum directly: GlobalVisState may advance
> > during a vacuum, allowing more pages to become considered all-visible.
> > In the rare case that it moves backward, VACUUM falls back to OldestXmin
> > to ensure we don’t attempt to freeze a dead tuple that wasn’t yet
> > prunable according to the GlobalVisState.
>
> It could, but it currently won't advance in vacuum, right?

I thought it was possible for it to advance when calling
heap_prune_satisfies_vacuum() ->
GlobalVisTestIsRemovableXid()->...GlobalVisUpdate(). This case isn't
going to be common, but some things can cause us to update it.

We have talked about explicitly updating GlobalVisState more often
during vacuums of large tables. But I was under the impression that it
was at least possible for it to advance during vacuum now.

- Melanie


Attachments:

  [text/x-patch] v18-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patch (12.8K, 2-v18-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patch)
  download | inline diff:
From d385615495305be4d42aeee0422dfeef8d26f3a9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 11:10:25 -0400
Subject: [PATCH v18 01/12] Refactor heap_page_prune_and_freeze() parameters
 into a struct
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

heap_page_prune_and_freeze() had accumulated an unwieldy number of input
parameters, and upcoming work to handle VM updates in this function will
add even more.

Introduce a new PruneFreezeParams struct to group the function’s input
parameters, improving readability and maintainability.

Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
---
 src/backend/access/heap/pruneheap.c  | 86 +++++++++++++---------------
 src/backend/access/heap/vacuumlazy.c | 16 ++++--
 src/include/access/heapam.h          | 62 ++++++++++++++++----
 src/tools/pgindent/typedefs.list     |  1 +
 4 files changed, 101 insertions(+), 64 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 231bea679c6..450b2eb6494 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -258,15 +258,23 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
 		{
 			OffsetNumber dummy_off_loc;
+			PruneFreezeParams params;
 			PruneFreezeResult presult;
 
+			params.relation = relation;
+			params.buffer = buffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
+
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+			params.options = 0;
+
+			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -419,60 +427,43 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
- * required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
- *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
  *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
- * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
- * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
+ * it's required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now.  The
+ * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
+ * arguments are required when freezing.  When HEAP_PRUNE_FREEZE option is
+ * passed, we also set presult->all_visible and presult->all_frozen on exit,
+ * to indicate if the VM bits can be set.  They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
+ * callers that also freeze need that information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
  * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set.  On entry, they contain the oldest XID and
- * multi-XID seen on the relation so far.  They will be updated with oldest
- * values present on the page after pruning.  After processing the whole
- * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
- * for the relation.
+ * HEAP_PRUNE_FREEZE option is set in params.  On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far.  They will be updated
+ * with oldest values present on the page after pruning.  After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
  */
 void
-heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-						   GlobalVisState *vistest,
-						   int options,
-						   struct VacuumCutoffs *cutoffs,
+heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   PruneFreezeResult *presult,
-						   PruneReason reason,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
 {
+	Buffer		buffer = params->buffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	OffsetNumber offnum,
@@ -486,10 +477,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
 	/* Copy parameters to prstate */
-	prstate.vistest = vistest;
-	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
-	prstate.cutoffs = cutoffs;
+	prstate.vistest = params->vistest;
+	prstate.mark_unused_now =
+		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.cutoffs = params->cutoffs;
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -583,7 +575,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
 	maxoff = PageGetMaxOffsetNumber(page);
-	tup.t_tableOid = RelationGetRelid(relation);
+	tup.t_tableOid = RelationGetRelid(params->relation);
 
 	/*
 	 * Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -786,7 +778,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
 	 */
-	do_freeze = heap_page_will_freeze(relation, buffer,
+	do_freeze = heap_page_will_freeze(params->relation, buffer,
 									  did_tuple_hint_fpi,
 									  do_prune,
 									  do_hint_prune,
@@ -838,7 +830,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		/*
 		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
 		 */
-		if (RelationNeedsWAL(relation))
+		if (RelationNeedsWAL(params->relation))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -876,11 +868,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			else
 				conflict_xid = prstate.latest_xid_removed;
 
-			log_heap_prune_and_freeze(relation, buffer,
+			log_heap_prune_and_freeze(params->relation, buffer,
 									  InvalidBuffer,	/* vmbuffer */
 									  0,	/* vmflags */
 									  conflict_xid,
-									  true, reason,
+									  true, params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 71fbd68c8ea..7db7c56311b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1964,10 +1964,16 @@ lazy_scan_prune(LVRelState *vacrel,
 {
 	Relation	rel = vacrel->rel;
 	PruneFreezeResult presult;
-	int			prune_options = 0;
+	PruneFreezeParams params;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
+	params.relation = rel;
+	params.buffer = buf;
+	params.reason = PRUNE_VACUUM_SCAN;
+	params.cutoffs = &vacrel->cutoffs;
+	params.vistest = vacrel->vistest;
+
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
 	 *
@@ -1983,12 +1989,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	params.options = HEAP_PAGE_PRUNE_FREEZE;
 	if (vacrel->nindexes == 0)
-		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
+		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+	heap_page_prune_and_freeze(&params,
+							   &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8cbff6ab0eb..74a5c24002b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -221,6 +221,55 @@ typedef struct HeapPageFreeze
 
 } HeapPageFreeze;
 
+
+/* 'reason' codes for heap_page_prune_and_freeze() */
+typedef enum
+{
+	PRUNE_ON_ACCESS,			/* on-access pruning */
+	PRUNE_VACUUM_SCAN,			/* VACUUM 1st heap pass */
+	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
+} PruneReason;
+
+/*
+ * Input parameters to heap_page_prune_and_freeze()
+ */
+typedef struct PruneFreezeParams
+{
+	Relation	relation;		/* relation containing buffer to be pruned */
+	Buffer		buffer;			/* buffer to be pruned */
+
+	/*
+	 * The reason pruning was performed.  It is used to set the WAL record
+	 * opcode which is used for debugging and analysis purposes.
+	 */
+	PruneReason reason;
+
+	/*
+	 * Contains flag bits:
+	 *
+	 * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+	 * pruning.
+	 *
+	 * FREEZE indicates that we will also freeze tuples, and will return
+	 * 'all_visible', 'all_frozen' flags to the caller.
+	 */
+	int			options;
+
+	/*
+	 * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+	 * (see heap_prune_satisfies_vacuum).
+	 */
+	GlobalVisState *vistest;
+
+	/*
+	 * cutoffs contains the freeze cutoffs, established by VACUUM at the
+	 * beginning of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE
+	 * option is set. cutoffs->OldestXmin is also used to determine if dead
+	 * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+	 */
+	struct VacuumCutoffs *cutoffs;
+} PruneFreezeParams;
+
 /*
  * Per-page state returned by heap_page_prune_and_freeze()
  */
@@ -264,13 +313,6 @@ typedef struct PruneFreezeResult
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 } PruneFreezeResult;
 
-/* 'reason' codes for heap_page_prune_and_freeze() */
-typedef enum
-{
-	PRUNE_ON_ACCESS,			/* on-access pruning */
-	PRUNE_VACUUM_SCAN,			/* VACUUM 1st heap pass */
-	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
-} PruneReason;
 
 /* ----------------
  *		function prototypes for heap access method
@@ -367,12 +409,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-									   GlobalVisState *vistest,
-									   int options,
-									   struct VacuumCutoffs *cutoffs,
+extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
-									   PruneReason reason,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5290b91e83e..b221b3699bf 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2342,6 +2342,7 @@ ProjectionPath
 PromptInterruptContext
 ProtocolVersion
 PrsStorage
+PruneFreezeParams
 PruneFreezeResult
 PruneReason
 PruneState
-- 
2.43.0



  [text/x-patch] v18-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patch (5.3K, 3-v18-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patch)
  download | inline diff:
From 4c3f113fbd62b553949b95cb352347767278e7dc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 14:55:40 -0400
Subject: [PATCH v18 02/12] Keep all_frozen updated in
 heap_page_prune_and_freeze

Previously, we relied on all_visible and all_frozen being used together
to ensure that all_frozen was correct, but it is better to keep both
fields updated.

Future changes will separate their usage, so we should not depend on
all_visible for the validity of all_frozen.
---
 src/backend/access/heap/pruneheap.c  | 22 +++++++++++-----------
 src/backend/access/heap/vacuumlazy.c |  9 ++++-----
 2 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 450b2eb6494..daa719fc2a1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
 	 * whether to freeze the page or not.  The all_visible and all_frozen
 	 * values returned to the caller are adjusted to include LP_DEAD items at
 	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -361,8 +357,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * anymore.  The opportunistic freeze heuristic must be improved;
 		 * however, for now, try to approximate the old logic.
 		 */
-		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		if (prstate->all_frozen && prstate->nfrozen > 0)
 		{
+			Assert(prstate->all_visible);
+
 			/*
 			 * Freezing would make the page all-frozen.  Have already emitted
 			 * an FPI or will do so anyway?
@@ -784,6 +782,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  do_hint_prune,
 									  &prstate);
 
+	Assert(!prstate.all_frozen || prstate.all_visible);
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -853,7 +853,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
+				if (prstate.all_frozen)
 					frz_conflict_horizon = prstate.visibility_cutoff_xid;
 				else
 				{
@@ -1418,7 +1418,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1440,7 +1440,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1453,7 +1453,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1472,7 +1472,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1490,7 +1490,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7db7c56311b..58de605ca09 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2020,7 +2020,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * agreement with heap_page_is_all_visible() using an assertion.
 	 */
 #ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
 	if (presult.all_visible)
 	{
 		TransactionId debug_cutoff;
@@ -2074,6 +2073,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
 	Assert(!presult.all_visible || !(*has_lpdead_items));
+	Assert(!presult.all_frozen || presult.all_visible);
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
@@ -2179,11 +2179,10 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
+	 * it as all-frozen.
 	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	else if (all_visible_according_to_vm && presult.all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		uint8		old_vmbits;
 
-- 
2.43.0



  [text/x-patch] v18-0003-Update-PruneState.all_-visible-frozen-earlier-in.patch (9.7K, 4-v18-0003-Update-PruneState.all_-visible-frozen-earlier-in.patch)
  download | inline diff:
From 181368d080f6a73304c4f248739ca08f85a737c4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:21:49 -0400
Subject: [PATCH v18 03/12] Update PruneState.all_[visible|frozen] earlier in
 pruning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items. This allows opportunistic
freezing if the page would otherwise be fully frozen, since those dead
items are later removed in vacuum’s third phase.

To move the VM update into the same WAL record that
prunes and freezes tuples, we must know whether the page will
be marked all-visible/all-frozen before emitting WAL.

The only barrier to updating these flags immediately after deciding
whether to opportunistically freeze is that we previously used
all_frozen to compute the snapshot conflict horizon when freezing
tuples. By determining the cutoff earlier, we can update the flags
immediately after making the freeze decision.

This is required to set the VM in the XLOG_HEAP2_PRUNE_VACUUM_SCAN
record emitted by pruning and freezing.
---
 src/backend/access/heap/pruneheap.c | 117 ++++++++++++++--------------
 1 file changed, 57 insertions(+), 60 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index daa719fc2a1..ef8861022f1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -138,11 +138,11 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
+	 * That's convenient for heap_page_prune_and_freeze() to use them to
+	 * decide whether to freeze the page or not.  The all_visible and
+	 * all_frozen values returned to the caller are adjusted to include
+	 * LP_DEAD items after we determine whether to opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -175,7 +175,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
-								  PruneState *prstate);
+								  PruneState *prstate, TransactionId *frz_conflict_horizon);
 
 
 /*
@@ -308,7 +308,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * performs several pre-freeze checks.
  *
  * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
- * determined before calling this function.
+ * determined before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to
+ * freeze tuples.
  *
  * prstate is both an input and output parameter.
  *
@@ -320,7 +322,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 					  bool did_tuple_hint_fpi,
 					  bool do_prune,
 					  bool do_hint_prune,
-					  PruneState *prstate)
+					  PruneState *prstate,
+					  TransactionId *frz_conflict_horizon)
 {
 	bool		do_freeze = false;
 
@@ -390,6 +393,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * critical section.
 		 */
 		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+		/*
+		 * Calculate what the snapshot conflict horizon should be for a record
+		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+		 * for conflicts when the whole page is eligible to become all-frozen
+		 * in the VM once we're done with it. Otherwise, we generate a
+		 * conservative cutoff by stepping back from OldestXmin.
+		 */
+		if (prstate->all_frozen)
+			*frz_conflict_horizon = prstate->visibility_cutoff_xid;
+		else
+		{
+			/* Avoids false conflicts when hot_standby_feedback in use */
+			*frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+			TransactionIdRetreat(*frz_conflict_horizon);
+		}
 	}
 	else if (prstate->nfrozen > 0)
 	{
@@ -434,10 +453,11 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * considered advantageous for overall system performance to do so now.  The
  * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
  * arguments are required when freezing.  When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen on exit,
- * to indicate if the VM bits can be set.  They are always set to false when
- * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
- * callers that also freeze need that information.
+ * passed, we also set presult->all_visible and presult->all_frozen after
+ * determining whether or not to opporunistically freeze, to indicate if the
+ * VM bits can be set.  They are always set to false when the
+ * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
+ * that also freeze need that information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -473,6 +493,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_hint_prune;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId frz_conflict_horizon = InvalidTransactionId;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = params->vistest;
@@ -542,10 +563,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * are tuples present that are not visible to everyone or if there are
 	 * dead tuples which are not yet removable.  However, dead tuples which
 	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * opportunistically freezing.  Because of that, we do not immediately
+	 * clear all_visible when we see LP_DEAD items.  We fix that after
+	 * scanning the line pointers, before we return the value to the caller,
+	 * so that the caller doesn't set the VM bit incorrectly.
 	 */
 	if (prstate.attempt_freeze)
 	{
@@ -780,7 +801,24 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  did_tuple_hint_fpi,
 									  do_prune,
 									  do_hint_prune,
-									  &prstate);
+									  &prstate,
+									  &frz_conflict_horizon);
+
+	/*
+	 * While scanning the line pointers, we did not clear
+	 * all_visible/all_frozen when encountering LP_DEAD items because we
+	 * wanted the decision whether or not to freeze the page to be unaffected
+	 * by the short-term presence of LP_DEAD items.  These LP_DEAD items are
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that we finished determining whether or not to freeze the page,
+	 * update all_visible and all_frozen so that they reflect the true state
+	 * of the page for setting PD_ALL_VISIBLE and VM bits.
+	 */
+	if (prstate.lpdead_items > 0)
+		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
 
@@ -842,27 +880,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
 			TransactionId conflict_xid;
 
-			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
-			 */
-			if (do_freeze)
-			{
-				if (prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
-			}
-
 			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
 				conflict_xid = frz_conflict_horizon;
 			else
@@ -888,30 +907,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
+	presult->all_visible = prstate.all_visible;
+	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
 	/*
-- 
2.43.0



  [text/x-patch] v18-0004-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (41.1K, 5-v18-0004-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 6db4df888158810125c25fa00a05fe31342a9c0f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 8 Oct 2025 15:39:01 -0400
Subject: [PATCH v18 04/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Reviewed-by: Robert Haas <[email protected]>
---
 src/backend/access/heap/heapam_xlog.c |  37 ++-
 src/backend/access/heap/pruneheap.c   | 434 ++++++++++++++++++++------
 src/backend/access/heap/vacuumlazy.c  | 207 +-----------
 src/include/access/heapam.h           |  43 ++-
 4 files changed, 421 insertions(+), 300 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 230d9888793..412ac3edf25 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -104,6 +104,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		OffsetNumber *frz_offsets;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 		bool		do_prune;
+		bool		set_lsn = false;
+		bool		mark_buffer_dirty = false;
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
 											   &nplans, &plans, &frz_offsets,
@@ -157,17 +159,36 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
-		if (vmflags & VISIBILITYMAP_VALID_BITS)
-			PageSetAllVisible(page);
-
-		MarkBufferDirty(buffer);
+		if (do_prune || nplans > 0)
+			mark_buffer_dirty = set_lsn = true;
 
 		/*
-		 * See log_heap_prune_and_freeze() for commentary on when we set the
-		 * heap page LSN.
+		 * The critical integrity requirement here is that we must never end
+		 * up with with the visibility map bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear.  If that were to occur, a subsequent page
+		 * modification would fail to clear the visibility map bit.
+		 *
+		 * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
+		 * marking an all-visible page all-frozen). If only the VM is updated,
+		 * the heap page need not be dirtied.
 		 */
-		if (do_prune || nplans > 0 ||
-			((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			mark_buffer_dirty = true;
+
+			/*
+			 * See log_heap_prune_and_freeze() for commentary on when we set
+			 * the heap page LSN.
+			 */
+			if (XLogHintBitIsNeeded())
+				set_lsn = true;
+		}
+
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+
+		if (set_lsn)
 			PageSetLSN(page, lsn);
 
 		/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ef8861022f1..b38b62779ab 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -133,16 +135,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -173,10 +176,21 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid, bool blk_already_av,
+									  bool set_blk_all_frozen);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate, TransactionId *frz_conflict_horizon);
-
+static bool heap_page_will_set_vis(Relation relation,
+								   BlockNumber heap_blk,
+								   Buffer heap_buf,
+								   Buffer vmbuffer,
+								   bool blk_known_av,
+								   const PruneState *prstate,
+								   uint8 *vmflags,
+								   bool *do_set_pd_vis);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -262,6 +276,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			params.reason = PRUNE_ON_ACCESS;
 			params.vistest = vistest;
 			params.cutoffs = NULL;
+			params.vmbuffer = InvalidBuffer;
+			params.blk_known_av = false;
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -434,10 +450,108 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneState and blk_known_av. Some callers may already
+ * have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
+ * should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+					   BlockNumber heap_blk,
+					   Buffer heap_buf,
+					   Buffer vmbuffer,
+					   bool blk_known_av,
+					   const PruneState *prstate,
+					   uint8 *vmflags,
+					   bool *do_set_pd_vis)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+	bool		do_set_vm = false;
+
+	*do_set_pd_vis = false;
+
+	if (!prstate->attempt_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	if (prstate->all_visible && !PageIsAllVisible(heap_page))
+		*do_set_pd_vis = true;
+
+	if ((prstate->all_visible && !blk_known_av) ||
+		(prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+		do_set_vm = true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that in vacuum the bit
+	 * got cleared after heap_vac_scan_next_block() was called, so we must
+	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return do_set_vm;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -452,12 +566,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * it's required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
  * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
- * arguments are required when freezing.  When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen after
- * determining whether or not to opporunistically freeze, to indicate if the
- * VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
- * that also freeze need that information.
+ * arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -482,6 +597,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	OffsetNumber offnum,
@@ -491,15 +607,22 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
+	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	TransactionId frz_conflict_horizon = InvalidTransactionId;
+	TransactionId conflict_xid = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = params->vistest;
 	prstate.mark_unused_now =
 		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
 	prstate.cutoffs = params->cutoffs;
 
 	/*
@@ -546,50 +669,54 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
+	 *
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
+	 *
+	 * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * when we encounter LP_DEAD items. Instead, we correct all_visible after
+	 * deciding whether to freeze, but before updating the VM, to avoid
+	 * setting the VM bit incorrectly.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible when we see LP_DEAD items.  We fix that after
-	 * scanning the line pointers, before we return the value to the caller,
-	 * so that the caller doesn't set the VM bit incorrectly.
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if (prstate.attempt_update_vm)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -821,6 +948,34 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+	 * based on information from the VM and the all_visible/all_frozen flags.
+	 *
+	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+	 * VM bit is clear, we strongly prefer to keep them in sync.
+	 *
+	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+	 * already been set. Setting only the VM is most common when setting an
+	 * already all-visible page all-frozen.
+	 */
+	do_set_vm = heap_page_will_set_vis(params->relation,
+									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   &prstate, &new_vmbits, &do_set_pd_vis);
+
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									prstate.latest_xid_removed, frz_conflict_horizon,
+									prstate.visibility_cutoff_xid, params->blk_known_av,
+									(do_set_vm && (new_vmbits & VISIBILITYMAP_ALL_FROZEN)));
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -842,14 +997,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_pd_vis)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -863,35 +1021,43 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (do_set_pd_vis)
+			PageSetAllVisible(page);
 
-		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
-		 */
-		if (RelationNeedsWAL(params->relation))
+		if (do_prune || do_freeze || do_set_pd_vis)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
+			Assert(PageIsAllVisible(page));
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
+			old_vmbits = visibilitymap_set_vmbits(blockno,
+												  vmbuffer, new_vmbits,
+												  params->relation->rd_locator);
+			if (old_vmbits == new_vmbits)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				/* Unset so we don't emit WAL since no change occurred */
+				do_set_vm = false;
+			}
+		}
 
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+		 * only updating the VM and it turns out it was already set, we will
+		 * have unset do_set_vm earlier. As such, check it again before
+		 * emitting the record.
+		 */
+		if (RelationNeedsWAL(params->relation) &&
+			(do_prune || do_freeze || do_set_vm))
+		{
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  do_set_pd_vis,
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -901,28 +1067,47 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
+
+		if (!heap_page_is_all_visible(params->relation, buffer,
+									  prstate.cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1413,6 +1598,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			{
 				TransactionId xmin;
 
+				Assert(prstate->attempt_update_vm);
+
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = prstate->all_frozen = false;
@@ -2058,6 +2245,64 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 	return nplans;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid, bool blk_already_av,
+				 bool set_blk_all_frozen)
+{
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	TransactionId conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune && !do_freeze &&
+		do_set_vm && blk_already_av && set_blk_all_frozen)
+		conflict_xid = InvalidTransactionId;
+
+	return conflict_xid;
+}
+
 /*
  * Write an XLOG_HEAP2_PRUNE* WAL record
  *
@@ -2082,6 +2327,15 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
+ * In some cases, such as when heap_page_prune_and_freeze() is setting an
+ * already marked all-visible page all-frozen, PD_ALL_VISIBLE may already be
+ * set. So, it is possible for vmflags to be non-zero and set_pd_all_vis to be
+ * false.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
@@ -2091,6 +2345,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 						  Buffer vmbuffer, uint8 vmflags,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
+						  bool set_pd_all_vis,
 						  PruneReason reason,
 						  HeapTupleFreeze *frozen, int nfrozen,
 						  OffsetNumber *redirected, int nredirected,
@@ -2127,7 +2382,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 */
 	if (!do_prune &&
 		nfrozen == 0 &&
-		(!do_set_vm || !XLogHintBitIsNeeded()))
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags_heap |= REGBUF_NO_IMAGE;
 
 	/*
@@ -2245,7 +2500,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * See comment at the top of the function about regbuf_flags_heap for
 	 * details on when we can advance the page LSN.
 	 */
-	if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
 	{
 		Assert(BufferIsDirty(buffer));
 		PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 58de605ca09..985a66bdb2e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,13 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   TransactionId OldestXmin,
 										   OffsetNumber *deadoffsets,
@@ -1973,6 +1966,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	params.reason = PRUNE_VACUUM_SCAN;
 	params.cutoffs = &vacrel->cutoffs;
 	params.vistest = vacrel->vistest;
+	params.vmbuffer = vmbuffer;
+	params.blk_known_av = all_visible_according_to_vm;
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1989,7 +1984,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	params.options = HEAP_PAGE_PRUNE_FREEZE;
+	params.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS;
 	if (vacrel->nindexes == 0)
 		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
@@ -2012,33 +2007,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2072,168 +2040,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
 	return presult.ndeleted;
@@ -2955,6 +2781,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  vmflags,
 								  conflict_xid,
 								  false,	/* no cleanup lock required */
+								  (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
@@ -3642,7 +3469,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId OldestXmin,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 74a5c24002b..cb70f8ec562 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 *
+	 * vmbuffer is the buffer that must already contain contain the required
+	 * block of the visibility map if we are to update it. blk_known_av is the
+	 * visibility status of the heap block as of the last call to
+	 * find_next_unskippable_block().
+	 */
+	Buffer		vmbuffer;
+	bool		blk_known_av;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -250,8 +261,9 @@ typedef struct PruneFreezeParams
 	 * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
 	 * pruning.
 	 *
-	 * FREEZE indicates that we will also freeze tuples, and will return
-	 * 'all_visible', 'all_frozen' flags to the caller.
+	 * FREEZE indicates that we will also freeze tuples
+	 *
+	 * UPDATE_VIS indicates that we will set the page's status in the VM.
 	 */
 	int			options;
 
@@ -284,19 +296,15 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+	 * we have attempted to update the VM.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
+	uint8		new_vmbits;
+	uint8		old_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -423,6 +431,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 									  Buffer vmbuffer, uint8 vmflags,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
+									  bool set_pd_all_vis,
 									  PruneReason reason,
 									  HeapTupleFreeze *frozen, int nfrozen,
 									  OffsetNumber *redirected, int nredirected,
@@ -433,6 +442,14 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
+
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v18-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.5K, 6-v18-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 393173db3db25838dd638ad334eb31ed09cb4f1e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v18 05/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 985a66bdb2e..14a8e342e51 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1878,9 +1878,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1897,13 +1900,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  true, /* set_pd_all_vis */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v18-0006-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (26.2K, 7-v18-0006-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 19c3b3150d386545c309a72fe21bcc7db11dbcb8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v18 06/12] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |  18 ++-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 111 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  28 +---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 54 insertions(+), 378 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 568696333c2..f881530a2a5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8797,50 +8797,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 412ac3edf25..5eafdff6c2e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -251,7 +251,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -264,142 +264,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -778,8 +642,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -791,11 +655,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1376,9 +1240,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b38b62779ab..d4006803330 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1031,9 +1031,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		{
 			Assert(PageIsAllVisible(page));
 
-			old_vmbits = visibilitymap_set_vmbits(blockno,
-												  vmbuffer, new_vmbits,
-												  params->relation->rd_locator);
+			old_vmbits = visibilitymap_set(blockno,
+										   vmbuffer, new_vmbits,
+										   params->relation->rd_locator);
 			if (old_vmbits == new_vmbits)
 			{
 				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
@@ -2308,14 +2308,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ *   all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 14a8e342e51..d14f69ccb0a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1900,11 +1900,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2786,9 +2786,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 * set PD_ALL_VISIBLE.
 		 */
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 2f5e61e2392..a75b5bb6b13 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
 
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
@@ -344,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..e9e77bd678b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b221b3699bf..3ef4c06c85d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4277,7 +4277,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v18-0007-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (8.2K, 8-v18-0007-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From bd2d6a5ee19706a4e7d51e1df3479234fe28e3fc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v18 07/12] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.

The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 16 ++++++++--------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 17 ++++++++---------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d4006803330..40d0ae6fcde 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -235,7 +235,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -730,9 +730,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1157,11 +1157,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1618,7 +1618,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				/*
 				 * For now always use prstate->cutoffs for this test, because
 				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
+				 * could use GlobalVisXidVisibleToAll() instead, if a
 				 * non-freezing caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 }
 
 /*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
  *
  * It is crucial that this only gets called for xids from a source that
  * protects against xid wraparounds (e.g. from a table and thus protected by
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
  */
 bool
 GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v18-0008-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (10.5K, 9-v18-0008-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From a2c8648351211dec01a107c80325a64a618ecafe Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v18 08/12] Use GlobalVisState in vacuum to determine page
 level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++++
 src/backend/access/heap/pruneheap.c         | 37 ++++++++++-----------
 src/backend/access/heap/vacuumlazy.c        | 17 +++++-----
 src/include/access/heapam.h                 |  7 ++--
 4 files changed, 57 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 40d0ae6fcde..6fc737eed69 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -712,11 +712,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -912,6 +913,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update the hint
@@ -1084,10 +1095,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		if (!heap_page_is_all_visible(params->relation, buffer,
-									  prstate.cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1615,19 +1625,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisXidVisibleToAll() instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d14f69ccb0a..92ad096d935 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,7 +465,7 @@ static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2740,7 +2740,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3495,14 +3495,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3523,7 +3522,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3542,7 +3541,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3616,7 +3615,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 		/* Visibility checks may do IO or allocate memory */
 		Assert(CritSectionCount == 0);
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3635,7 +3634,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index cb70f8ec562..00213fad852 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -276,8 +276,7 @@ typedef struct PruneFreezeParams
 	/*
 	 * cutoffs contains the freeze cutoffs, established by VACUUM at the
 	 * beginning of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE
-	 * option is set. cutoffs->OldestXmin is also used to determine if dead
-	 * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+	 * option is set.
 	 */
 	struct VacuumCutoffs *cutoffs;
 } PruneFreezeParams;
@@ -444,7 +443,7 @@ extern void heap_vacuum_rel(Relation rel,
 
 #ifdef USE_ASSERT_CHECKING
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -457,6 +456,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v18-0009-Unset-all_visible-sooner-if-not-freezing.patch (2.4K, 10-v18-0009-Unset-all_visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 51ea8c8266da0947c46951279d13fc8834f0ca45 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v18 09/12] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
---
 src/backend/access/heap/pruneheap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6fc737eed69..2979cb74651 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1483,8 +1483,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1739,8 +1742,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v18-0010-Allow-on-access-pruning-to-set-pages-all-visible.patch (28.0K, 11-v18-0010-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From fc3618d0940f6698a009ec2ddc7886d975374cc6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v18 10/12] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c              | 15 +++-
 src/backend/access/heap/heapam_handler.c      | 15 +++-
 src/backend/access/heap/pruneheap.c           | 73 +++++++++++++++----
 src/backend/access/index/indexam.c            | 46 ++++++++++++
 src/backend/access/table/tableam.c            | 39 +++++++++-
 src/backend/executor/execMain.c               |  4 +
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  7 +-
 src/backend/executor/nodeIndexscan.c          | 18 +++--
 src/backend/executor/nodeSeqscan.c            | 24 ++++--
 src/include/access/genam.h                    | 11 +++
 src/include/access/heapam.h                   | 24 +++++-
 src/include/access/relscan.h                  |  6 ++
 src/include/access/tableam.h                  | 30 +++++++-
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  3 +-
 16 files changed, 284 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f881530a2a5..d8594b9aac1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2979cb74651..6e863ffd85e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -188,7 +188,9 @@ static bool heap_page_will_set_vis(Relation relation,
 								   Buffer heap_buf,
 								   Buffer vmbuffer,
 								   bool blk_known_av,
-								   const PruneState *prstate,
+								   PruneReason reason,
+								   bool do_prune, bool do_freeze,
+								   PruneState *prstate,
 								   uint8 *vmflags,
 								   bool *do_set_pd_vis);
 
@@ -203,9 +205,13 @@ static bool heap_page_will_set_vis(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -271,12 +277,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeParams params;
 			PruneFreezeResult presult;
 
+			params.options = 0;
+			params.vmbuffer = InvalidBuffer;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			params.relation = relation;
 			params.buffer = buffer;
 			params.reason = PRUNE_ON_ACCESS;
 			params.vistest = vistest;
 			params.cutoffs = NULL;
-			params.vmbuffer = InvalidBuffer;
 			params.blk_known_av = false;
 
 			/*
@@ -456,6 +471,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * have examined this page’s VM bits (e.g., VACUUM in the previous
  * heap_vac_scan_next_block() call) and can pass that along.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with the desired
  * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
  * should be set on the heap page.
@@ -466,7 +484,9 @@ heap_page_will_set_vis(Relation relation,
 					   Buffer heap_buf,
 					   Buffer vmbuffer,
 					   bool blk_known_av,
-					   const PruneState *prstate,
+					   PruneReason reason,
+					   bool do_prune, bool do_freeze,
+					   PruneState *prstate,
 					   uint8 *vmflags,
 					   bool *do_set_pd_vis)
 {
@@ -482,6 +502,23 @@ heap_page_will_set_vis(Relation relation,
 		return false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+	{
+		prstate->all_visible = prstate->all_frozen = false;
+		return false;
+	}
+
 	if (prstate->all_visible && !PageIsAllVisible(heap_page))
 		*do_set_pd_vis = true;
 
@@ -505,6 +542,11 @@ heap_page_will_set_vis(Relation relation,
 	 * page-level bit is clear.  However, it's possible that in vacuum the bit
 	 * got cleared after heap_vac_scan_next_block() was called, so we must
 	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 *
+	 * This will never trigger for on-access pruning because it couldn't have
+	 * done a previous visibility map lookup and thus will always pass
+	 * blk_known_av as false. A future vacuum will have to take care of fixing
+	 * the corruption.
 	 */
 	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
 			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -913,6 +955,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * Even if we don't prune anything, if we found a new value for the
+	 * pd_prune_xid field or the page was marked full, we will update the hint
+	 * bit.
+	 */
+	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+		PageIsFull(page);
+
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
@@ -923,14 +973,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
-	/*
-	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
-	 */
-	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
-		PageIsFull(page);
-
 	/*
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
@@ -974,6 +1016,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	do_set_vm = heap_page_will_set_vis(params->relation,
 									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   params->reason, do_prune, do_freeze,
 									   &prstate, &new_vmbits, &do_set_pd_vis);
 
 	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -2250,7 +2293,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 
 /*
  * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
- * record.
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
  */
 static TransactionId
 get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
@@ -2319,8 +2362,8 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
  * all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..8d582a8eafd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 5e41404937e..3e3a0f72a71 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 831c55ce787..15be318fd41 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..aa2112c8e04 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -178,6 +178,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -204,6 +209,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 00213fad852..342560e1034 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -415,7 +432,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..f250d4e7aec 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -882,6 +884,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -919,10 +940,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a36653c37f9..9c54fa06e4a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v18-0011-Set-pd_prune_xid-on-insert.patch (6.7K, 12-v18-0011-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From ac4338510fe32446375801ffd78b38367a87a56b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v18 11/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d8594b9aac1..a3e2c4c20cd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5eafdff6c2e..21972347dec 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -463,6 +463,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -612,9 +618,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



  [text/x-patch] v18-0012-Split-heap_page_prune_and_freeze-into-helpers.patch (17.8K, 13-v18-0012-Split-heap_page_prune_and_freeze-into-helpers.patch)
  download | inline diff:
From 163e09cb81eeb1af31cd9b3a648896845587ce3a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 8 Oct 2025 18:45:45 -0400
Subject: [PATCH v18 12/12] Split heap_page_prune_and_freeze into helpers

---
 src/backend/access/heap/pruneheap.c | 316 +++++++++++++++-------------
 1 file changed, 170 insertions(+), 146 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6e863ffd85e..d21a66f6a75 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -590,82 +590,20 @@ heap_page_will_set_vis(Relation relation,
 	return do_set_vm;
 }
 
-/*
- * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page. If the page's visibility status has changed, update it in
- * the VM.
- *
- * Caller must have pin and buffer cleanup lock on the page.  Note that we
- * don't update the FSM information for page on caller's behalf.  Caller might
- * also need to account for a reduction in the length of the line pointer
- * array following array truncation by us.
- *
- * params contains the input parameters used to control freezing and pruning
- * behavior. See the definition of PruneFreezeParams for more on what each
- * parameter does.
- *
- * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
- * it's required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now.  The
- * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
- * arguments are required when freezing.
- *
- * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
- * the page has changed, we will update the VM at the same time as pruning and
- * freezing the heap page. We will also update presult->old_vmbits and
- * presult->new_vmbits with the state of the VM before and after updating it
- * for the caller to use in bookkeeping.
- *
- * presult contains output parameters needed by callers, such as the number of
- * tuples removed and the offsets of dead items on the page after pruning.
- * heap_page_prune_and_freeze() is responsible for initializing it.  Required
- * by all callers.
- *
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
- * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set in params.  On entry, they contain the
- * oldest XID and multi-XID seen on the relation so far.  They will be updated
- * with oldest values present on the page after pruning.  After processing the
- * whole relation, VACUUM can use these values as the new
- * relfrozenxid/relminmxid for the relation.
- */
-void
-heap_page_prune_and_freeze(PruneFreezeParams *params,
-						   PruneFreezeResult *presult,
-						   OffsetNumber *off_loc,
-						   TransactionId *new_relfrozen_xid,
-						   MultiXactId *new_relmin_mxid)
+static void
+prune_freeze_setup(PruneFreezeParams *params, PruneState *prstate,
+				   TransactionId *new_relfrozen_xid,
+				   MultiXactId *new_relmin_mxid,
+				   PruneFreezeResult *presult)
 {
-	Buffer		buffer = params->buffer;
-	Buffer		vmbuffer = params->vmbuffer;
-	Page		page = BufferGetPage(buffer);
-	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	OffsetNumber offnum,
-				maxoff;
-	PruneState	prstate;
-	HeapTupleData tup;
-	bool		do_freeze;
-	bool		do_prune;
-	bool		do_hint_prune;
-	bool		do_set_vm;
-	bool		do_set_pd_vis;
-	bool		did_tuple_hint_fpi;
-	int64		fpi_before = pgWalUsage.wal_fpi;
-	TransactionId frz_conflict_horizon = InvalidTransactionId;
-	TransactionId conflict_xid = InvalidTransactionId;
-	uint8		new_vmbits = 0;
-	uint8		old_vmbits = 0;
-
 	/* Copy parameters to prstate */
-	prstate.vistest = params->vistest;
-	prstate.mark_unused_now =
+	prstate->vistest = params->vistest;
+	prstate->mark_unused_now =
 		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
-	prstate.attempt_update_vm =
+	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
 		(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
-	prstate.cutoffs = params->cutoffs;
+	prstate->cutoffs = params->cutoffs;
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -678,37 +616,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * prunable, we will save the lowest relevant XID in new_prune_xid. Also
 	 * initialize the rest of our working state.
 	 */
-	prstate.new_prune_xid = InvalidTransactionId;
-	prstate.latest_xid_removed = InvalidTransactionId;
-	prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
-	prstate.nroot_items = 0;
-	prstate.nheaponly_items = 0;
+	prstate->new_prune_xid = InvalidTransactionId;
+	prstate->latest_xid_removed = InvalidTransactionId;
+	prstate->nredirected = prstate->ndead = prstate->nunused = prstate->nfrozen = 0;
+	prstate->nroot_items = 0;
+	prstate->nheaponly_items = 0;
 
 	/* initialize page freezing working state */
-	prstate.pagefrz.freeze_required = false;
-	if (prstate.attempt_freeze)
+	prstate->pagefrz.freeze_required = false;
+	if (prstate->attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
-		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
-		prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
-		prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
-		prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+		prstate->pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
+		prstate->pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
+		prstate->pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
+		prstate->pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
 	}
 	else
 	{
 		Assert(new_relfrozen_xid == NULL && new_relmin_mxid == NULL);
-		prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
-		prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
-		prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
-		prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+		prstate->pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+		prstate->pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+		prstate->pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+		prstate->pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
 	}
 
-	prstate.ndeleted = 0;
-	prstate.live_tuples = 0;
-	prstate.recently_dead_tuples = 0;
-	prstate.hastup = false;
-	prstate.lpdead_items = 0;
-	prstate.deadoffsets = presult->deadoffsets;
+	prstate->ndeleted = 0;
+	prstate->live_tuples = 0;
+	prstate->recently_dead_tuples = 0;
+	prstate->hastup = false;
+	prstate->lpdead_items = 0;
+	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
 	 * Track whether the page could be marked all-visible and/or all-frozen.
@@ -736,20 +674,20 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * bookkeeping. In this case, initializing all_visible to false allows
 	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
-	if (prstate.attempt_freeze)
+	if (prstate->attempt_freeze)
 	{
-		prstate.all_visible = true;
-		prstate.all_frozen = true;
+		prstate->all_visible = true;
+		prstate->all_frozen = true;
 	}
-	else if (prstate.attempt_update_vm)
+	else if (prstate->attempt_update_vm)
 	{
-		prstate.all_visible = true;
-		prstate.all_frozen = false;
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
 	}
 	else
 	{
-		prstate.all_visible = false;
-		prstate.all_frozen = false;
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
 	}
 
 	/*
@@ -761,10 +699,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * used to calculate the snapshot conflict horizon when updating the VM
 	 * and/or freezing all the tuples on the page.
 	 */
-	prstate.visibility_cutoff_xid = InvalidTransactionId;
+	prstate->visibility_cutoff_xid = InvalidTransactionId;
+}
 
-	maxoff = PageGetMaxOffsetNumber(page);
-	tup.t_tableOid = RelationGetRelid(params->relation);
+static void
+prune_freeze_plan(PruneState *prstate, BlockNumber blockno, Buffer buffer, Page page,
+				  OffsetNumber maxoff, OffsetNumber *off_loc, HeapTuple tup)
+{
+	OffsetNumber offnum;
 
 	/*
 	 * Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -799,13 +741,13 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		*off_loc = offnum;
 
-		prstate.processed[offnum] = false;
-		prstate.htsv[offnum] = -1;
+		prstate->processed[offnum] = false;
+		prstate->htsv[offnum] = -1;
 
 		/* Nothing to do if slot doesn't contain a tuple */
 		if (!ItemIdIsUsed(itemid))
 		{
-			heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
+			heap_prune_record_unchanged_lp_unused(page, prstate, offnum);
 			continue;
 		}
 
@@ -815,17 +757,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * If the caller set mark_unused_now true, we can set dead line
 			 * pointers LP_UNUSED now.
 			 */
-			if (unlikely(prstate.mark_unused_now))
-				heap_prune_record_unused(&prstate, offnum, false);
+			if (unlikely(prstate->mark_unused_now))
+				heap_prune_record_unused(prstate, offnum, false);
 			else
-				heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
+				heap_prune_record_unchanged_lp_dead(page, prstate, offnum);
 			continue;
 		}
 
 		if (ItemIdIsRedirected(itemid))
 		{
 			/* This is the start of a HOT chain */
-			prstate.root_items[prstate.nroot_items++] = offnum;
+			prstate->root_items[prstate->nroot_items++] = offnum;
 			continue;
 		}
 
@@ -835,25 +777,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * Get the tuple's visibility status and queue it up for processing.
 		 */
 		htup = (HeapTupleHeader) PageGetItem(page, itemid);
-		tup.t_data = htup;
-		tup.t_len = ItemIdGetLength(itemid);
-		ItemPointerSet(&tup.t_self, blockno, offnum);
+		tup->t_data = htup;
+		tup->t_len = ItemIdGetLength(itemid);
+		ItemPointerSet(&tup->t_self, blockno, offnum);
 
-		prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
-														   buffer);
+		prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, tup,
+															buffer);
 
 		if (!HeapTupleHeaderIsHeapOnly(htup))
-			prstate.root_items[prstate.nroot_items++] = offnum;
+			prstate->root_items[prstate->nroot_items++] = offnum;
 		else
-			prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
+			prstate->heaponly_items[prstate->nheaponly_items++] = offnum;
 	}
 
-	/*
-	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
-	 * an FPI to be emitted.
-	 */
-	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
-
 	/*
 	 * Process HOT chains.
 	 *
@@ -865,30 +801,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * the page instead of using the root_items array, also did it in
 	 * ascending offset number order.)
 	 */
-	for (int i = prstate.nroot_items - 1; i >= 0; i--)
+	for (int i = prstate->nroot_items - 1; i >= 0; i--)
 	{
-		offnum = prstate.root_items[i];
+		offnum = prstate->root_items[i];
 
 		/* Ignore items already processed as part of an earlier chain */
-		if (prstate.processed[offnum])
+		if (prstate->processed[offnum])
 			continue;
 
 		/* see preceding loop */
 		*off_loc = offnum;
 
 		/* Process this item or chain of items */
-		heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
+		heap_prune_chain(page, blockno, maxoff, offnum, prstate);
 	}
 
 	/*
 	 * Process any heap-only tuples that were not already processed as part of
 	 * a HOT chain.
 	 */
-	for (int i = prstate.nheaponly_items - 1; i >= 0; i--)
+	for (int i = prstate->nheaponly_items - 1; i >= 0; i--)
 	{
-		offnum = prstate.heaponly_items[i];
+		offnum = prstate->heaponly_items[i];
 
-		if (prstate.processed[offnum])
+		if (prstate->processed[offnum])
 			continue;
 
 		/* see preceding loop */
@@ -907,7 +843,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * return true for an XMIN_INVALID tuple, so this code will work even
 		 * when there were sequential updates within the aborted transaction.)
 		 */
-		if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
+		if (prstate->htsv[offnum] == HEAPTUPLE_DEAD)
 		{
 			ItemId		itemid = PageGetItemId(page, offnum);
 			HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -915,8 +851,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
 			{
 				HeapTupleHeaderAdvanceConflictHorizon(htup,
-													  &prstate.latest_xid_removed);
-				heap_prune_record_unused(&prstate, offnum, true);
+													  &prstate->latest_xid_removed);
+				heap_prune_record_unused(prstate, offnum, true);
 			}
 			else
 			{
@@ -933,7 +869,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			}
 		}
 		else
-			heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
+			heap_prune_record_unchanged_lp_normal(page, prstate, offnum);
 	}
 
 	/* We should now have processed every tuple exactly once  */
@@ -944,12 +880,110 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	{
 		*off_loc = offnum;
 
-		Assert(prstate.processed[offnum]);
+		Assert(prstate->processed[offnum]);
 	}
 #endif
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate->all_visible &&
+		TransactionIdIsNormal(prstate->visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate->vistest, prstate->visibility_cutoff_xid))
+		prstate->all_visible = prstate->all_frozen = false;
+
 	/* Clear the offset information once we have processed the given page. */
 	*off_loc = InvalidOffsetNumber;
+}
+
+/*
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
+ *
+ * Caller must have pin and buffer cleanup lock on the page.  Note that we
+ * don't update the FSM information for page on caller's behalf.  Caller might
+ * also need to account for a reduction in the length of the line pointer
+ * array following array truncation by us.
+ *
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
+ * it's required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now.  The
+ * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
+ * arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
+ *
+ * presult contains output parameters needed by callers, such as the number of
+ * tuples removed and the offsets of dead items on the page after pruning.
+ * heap_page_prune_and_freeze() is responsible for initializing it.  Required
+ * by all callers.
+ *
+ * off_loc is the offset location required by the caller to use in error
+ * callback.
+ *
+ * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
+ * HEAP_PRUNE_FREEZE option is set in params.  On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far.  They will be updated
+ * with oldest values present on the page after pruning.  After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
+ */
+void
+heap_page_prune_and_freeze(PruneFreezeParams *params,
+						   PruneFreezeResult *presult,
+						   OffsetNumber *off_loc,
+						   TransactionId *new_relfrozen_xid,
+						   MultiXactId *new_relmin_mxid)
+{
+	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
+	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	OffsetNumber maxoff;
+	PruneState	prstate;
+	HeapTupleData tup;
+	bool		do_freeze;
+	bool		do_prune;
+	bool		do_hint_prune;
+	bool		do_set_vm;
+	bool		do_set_pd_vis;
+	bool		did_tuple_hint_fpi;
+	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId frz_conflict_horizon = InvalidTransactionId;
+	TransactionId conflict_xid = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
+
+	maxoff = PageGetMaxOffsetNumber(page);
+	tup.t_tableOid = RelationGetRelid(params->relation);
+
+	/* Initialize needed state in prstate */
+	prune_freeze_setup(params, &prstate, new_relfrozen_xid, new_relmin_mxid, presult);
+
+	/*
+	 * Examine all line pointers and tuple visibility information to determine
+	 * which line pointers should change state and which tuples may be frozen.
+	 * Prepare queue of state changes to later be executed in a critical
+	 * section.
+	 */
+	prune_freeze_plan(&prstate, blockno, buffer, page, maxoff, off_loc, &tup);
+
+	/*
+	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+	 * an FPI to be emitted.
+	 */
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	do_prune = prstate.nredirected > 0 ||
 		prstate.ndead > 0 ||
@@ -963,16 +997,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
-	/*
-	 * After processing all the live tuples on the page, if the newest xmin
-	 * amongst them is not visible to everyone, the page cannot be
-	 * all-visible.
-	 */
-	if (prstate.all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
-		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
-		prstate.all_visible = prstate.all_frozen = false;
-
 	/*
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-10-29 11:03  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Kirill Reshke @ 2025-10-29 11:03 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, 15 Oct 2025 at 04:27, Melanie Plageman
<[email protected]> wrote:
>
> Thanks so much for the review! I've addressed all your feedback except
> what is commented on inline below.
> I've gone ahead and committed the preliminary patches that you thought
> were ready to commit.
>
> Attached v18 is what remains.
>
> 0001 - 0003: refactoring
> 0004 - 0006: finish eliminating XLOG_HEAP2_VISIBLE
> 0007 - 0009: refactoring
> 0010: Set VM on-access
> 0011: Set prune xid on insert
> 0012: Some refactoring for discussion
>
> For 0001, I got feedback heap_page_prune_and_freeze() has too many
> arguments, so I tried to address that. I'm interested to know if folks
> like this more.
>
> 0011 still needs a bit of investigation to understand fully if
> anything else in the index-killtuples test needs to be changed to make
> sure we have the same coverage.
>
> 0012 is sort of WIP. I got feedback heap_page_prune_and_freeze() was
> too long and should be split up into helpers. I want to know if this
> split makes sense. I can pull it down the patch stack if so.
>
> Only 0001 and 0012 are optional amongst the refactoring patches. The
> others are required to make on-access VM-setting possible or viable.
>
> On Thu, Oct 9, 2025 at 2:18 PM Andres Freund <[email protected]> wrote:
> >
> > > @@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
> > >       }
> > > -     action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
> > > -                                                                                (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
> > > -                                                                                &buffer);
> > > -     if (action == BLK_NEEDS_REDO)
> > > +     if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
> > > +                                                                       (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
> > > +                                                                       &buffer) == BLK_NEEDS_REDO)
> > >       {
> > >               Page            page = BufferGetPage(buffer);
> > >               OffsetNumber *redirected;
> >
> > Why move it around this way?
>
> Because there will be an action for the visibility map
> XLogReadBufferForRedoExtended(). I could have renamed it heap_action,
> but it is being used only in one place, so I preferred to just cut it
> to avoid any confusion.
>
> > > Advance the page LSN
> > > +              * only if the record could include an FPI, since recovery skips
> > > +              * records <= the stamped LSN. Otherwise it might skip an earlier FPI
> > > +              * needed to repair a torn page.
> > > +              */
> >
> > This is confusing, should probably just reference the stuff we did in the
> > !recovery case.
>
> I fixed this and addressed all your feedback related to this before committing.
>
> > > +             if (do_prune || nplans > 0 ||
> > > +                     ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
> > > +                     PageSetLSN(page, lsn);
> > > +
> > >               /*
> > >                * Note: we don't worry about updating the page's prunability hints.
> > >                * At worst this will cause an extra prune cycle to occur soon.
> > >                */
> >
> > Not your fault, but that seems odd? Why aren't we just doing the right thing?
>
> The comment dates back to 6f10eb2. I imagine no one ever bothered to
> fuss with extracting the XID. You could change
> heap_page_prune_execute() to return the right value -- though that's a
> bit ugly since it is used in normal operation as well as recovery.
>
> > I wonder if the VM specific redo portion should be in a common helper? Might
> > not be enough code to worry though...
>
> I think it might be more code as a helper at this point.
>
> > > @@ -2860,6 +2867,29 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
> > >                                                        VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
> > >                                                        InvalidOffsetNumber);
> > >
> > > +     /*
> > > +      * Before marking dead items unused, check whether the page will become
> > > +      * all-visible once that change is applied.
> >
> > So the function is named _would_ but here you say will :)
>
> I thought about it more and still feel that this function name should
> contain "would". From vacuum's perspective it is "will" -- because it
> knows it will remove those dead items, but from the function's
> perspective it is hypothetical. I changed the comment though.
>
> > > +     if (heap_page_would_be_all_visible(vacrel, buffer,
> > > +                                                                        deadoffsets, num_offsets,
> > > +                                                                        &all_frozen, &visibility_cutoff_xid))
> > > +     {
> > > +             vmflags |= VISIBILITYMAP_ALL_VISIBLE;
> > > +             if (all_frozen)
> > > +             {
> > > +                     vmflags |= VISIBILITYMAP_ALL_FROZEN;
> > > +                     Assert(!TransactionIdIsValid(visibility_cutoff_xid));
> > > +             }
> > > +
> > > +             /* Take the lock on the vmbuffer before entering a critical section */
> > > +             LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
> >
> > It sure would be nice if we had documented the lock order between the heap
> > page and the corresponding VM page anywhere.  This is just doing what we did
> > before, so it's not this patch's fault, but I did get worried about it for a
> > moment.
>
> Well, the comment above the visibilitymap_set* functions says what
> expectations they have for the heap page being locked.
>
> > > +static bool
> > > +heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
> > > +                                                        OffsetNumber *deadoffsets,
> > > +                                                        int ndeadoffsets,
> > > +                                                        bool *all_frozen,
> > > +                                                        TransactionId *visibility_cutoff_xid)
> > > +{
> > >       Page            page = BufferGetPage(buf);
>
> > Hm, what about an assert checking that matched_dead_count == ndeadoffsets at
> > the end?
>
> I was going to put an Assert(ndeadoffsets <= matched_dead_count), but
> then I started wondering if there is a way we could end up with fewer
> dead items than we collected during phase I.
>
> I had thought about if we dropped an index and then did on-access
> pruning -- but we don't allow setting LP_DEAD items LP_UNUSED in
> on-access pruning. So, maybe this is safe... I can do a follow-on
> commit to add the assert. But I'm just not 100% sure I've thought of
> all the cases where we might end up with fewer dead items.
>
> > > During vacuum's first and third phases, we examine tuples' visibility
> > > to determine if we can set the page all-visible in the visibility map.
> > >
> > > Previously, this check compared tuple xmins against a single XID chosen at
> > > the start of vacuum (OldestXmin). We now use GlobalVisState, which also
> > > enables future work to set the VM during on-access pruning, since ordinary
> > > queries have access to GlobalVisState but not OldestXmin.
> > >
> > > This also benefits vacuum directly: GlobalVisState may advance
> > > during a vacuum, allowing more pages to become considered all-visible.
> > > In the rare case that it moves backward, VACUUM falls back to OldestXmin
> > > to ensure we don’t attempt to freeze a dead tuple that wasn’t yet
> > > prunable according to the GlobalVisState.
> >
> > It could, but it currently won't advance in vacuum, right?
>
> I thought it was possible for it to advance when calling
> heap_prune_satisfies_vacuum() ->
> GlobalVisTestIsRemovableXid()->...GlobalVisUpdate(). This case isn't
> going to be common, but some things can cause us to update it.
>
> We have talked about explicitly updating GlobalVisState more often
> during vacuums of large tables. But I was under the impression that it
> was at least possible for it to advance during vacuum now.
>
> - Melanie


Hi!

First of all, I rechecked v18 patches, they still cause WAL bytes
reduction. In a no-index vacuum case my result is a 39% reduction in
WAL bytes.
Almost like in your first message.

Here are my comments about code, I may be very nitpicky in minor
details, sorry for that

In 0003:

get_conflict_xid function logic is bit strange for me, it assigns
conflict_xid to some value,  but in the very end we have

> + /*
>+ * We can omit the snapshot conflict horizon if we are not pruning or
>+ * freezing any tuples and are setting an already all-visible page
>+ * all-frozen in the VM. In this case, all of the tuples on the page must
>+ * already be visible to all MVCC snapshots on the standby.
>+ */
>+ if (!do_prune && !do_freeze &&
>+ do_set_vm && blk_already_av && set_blk_all_frozen)
> + conflict_xid = InvalidTransactionId;

I feel like we should move this check to the beginning of the
function, and just  return InvalidTransactionId in that if cond.

in 0004:

> + if (old_vmbits == new_vmbits)
> + {
> + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
> + /* Unset so we don't emit WAL since no change occurred */
> + do_set_vm = false;
> + }

and then

>  END_CRIT_SECTION();
> + if (do_set_vm)
> + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
> +

So, in the heap_page_prune_and_freeze function we release buffer lock
both inside and outside the crit section. As I understand, this is
actually safe. I also looked in other xlog coding practices for other
access methods (GiST, GIN, ....), and I can see that some of them
release buffers before leaving crit sections and some of them after.
But I still suggest to be in sync with 'Write-Ahead Log Coding'
section of
src/backend/access/transam/README, which says:

6. END_CRIT_SECTION()

7. Unlock and unpin the buffer(s).

Let's be consistent in this at least in this single function context.


In 0010:

I'm not terribly convenient that adding SO_ALLOW_VM_SET to TAM
ScanOptions is the right thing to do. Looks like VM bits are something
that make sense for HEAP AM for not for any TAM. So, don't we break
some layer of abstraction here? Would it be better for HEAP AM to set
some flags in heap_beginscan?


Overall 0001-0003 are mostly fine for me, 0004-0006 are the right
thing to do IMHO, but maybe they need some more review from hackers.
Other patches i did not review in a great detail, will return to this
later



-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-11-04 16:48  Melanie Plageman <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-11-04 16:48 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Thanks for the review!

On Wed, Oct 29, 2025 at 7:03 AM Kirill Reshke <[email protected]> wrote:
>
> get_conflict_xid function logic is bit strange for me, it assigns
> conflict_xid to some value,  but in the very end we have
>
> > + /*
> >+ * We can omit the snapshot conflict horizon if we are not pruning or
> >+ * freezing any tuples and are setting an already all-visible page
> >+ * all-frozen in the VM. In this case, all of the tuples on the page must
> >+ * already be visible to all MVCC snapshots on the standby.
> >+ */
> >+ if (!do_prune && !do_freeze &&
> >+ do_set_vm && blk_already_av && set_blk_all_frozen)
> > + conflict_xid = InvalidTransactionId;
>
> I feel like we should move this check to the beginning of the
> function, and just  return InvalidTransactionId in that if cond.

You're right. I've changed it as you suggest in attached v19.

> > + if (old_vmbits == new_vmbits)
> > + {
> > + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
> > + /* Unset so we don't emit WAL since no change occurred */
> > + do_set_vm = false;
> > + }
>
> and then
>
> >  END_CRIT_SECTION();
> > + if (do_set_vm)
> > + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
> > +
>
> So, in the heap_page_prune_and_freeze function we release buffer lock
> both inside and outside the crit section. As I understand, this is
> actually safe. I also looked in other xlog coding practices for other
> access methods (GiST, GIN, ....), and I can see that some of them
> release buffers before leaving crit sections and some of them after.
> But I still suggest to be in sync with 'Write-Ahead Log Coding'
> section of
> src/backend/access/transam/README, which says:
>
> 6. END_CRIT_SECTION()
>
> 7. Unlock and unpin the buffer(s).
>
> Let's be consistent in this at least in this single function context.

I see what you are saying. However, I don't see a good way to
determine whether or not we need to unlock the VM without introducing
another local variable in the outermost scope -- like "unlock_vm".
This function already has a lot of local variables, so I'm loath to do
that. And we want do_set_vm to reflect whether or not we actually set
it in case it gets used in the future.

This function doesn't lock or unlock the heap buffer so it doesn't
seem as urgent to me to follow the letter of the law in this case.

Attached patch doesn't have this change, but this is what it would look like:

    /* Lock vmbuffer before entering a critical section */
+   unlock_vm = do_set_vm;
    if (do_set_vm)
        LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);

@@ -1112,12 +1114,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
            old_vmbits = visibilitymap_set(blockno,
                                           vmbuffer, new_vmbits,
                                           params->relation->rd_locator);
-           if (old_vmbits == new_vmbits)
-           {
-               LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-               /* Unset so we don't emit WAL since no change occurred */
-               do_set_vm = false;
-           }
+
+           /* Unset so we don't emit WAL since no change occurred */
+           do_set_vm = old_vmbits != new_vmbits;
        }

        /*
@@ -1145,7 +1144,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,

    END_CRIT_SECTION();

-   if (do_set_vm)
+   if (unlock_vm)
        LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);

> In 0010:
>
> I'm not terribly convenient that adding SO_ALLOW_VM_SET to TAM
> ScanOptions is the right thing to do. Looks like VM bits are something
> that make sense for HEAP AM for not for any TAM. So, don't we break
> some layer of abstraction here? Would it be better for HEAP AM to set
> some flags in heap_beginscan?

I don't see another good way of doing it.

The information about whether or not the relation is modified in the
query is gathered during planning and saved in the plan. We need to
get that information to the scan descriptor, which is all we have when
we call heap_page_prune_opt() during the scan. The scan descriptor is
created by the table AM implementations of scan_begin(). The table AM
callbacks don't pass down the plan -- which makes sense; the scan
shouldn't know about the plan. They do pass down flags, so I thought
it made the most sense to add a flag. Note that I was able to avoid
modifying the actual table and index AM callbacks (scan_begin() and
ambeginscan()). I only made new wrappers that took "modifies_rel".

Now, it is true that referring to the VM is somewhat of a layering
violation. Though, other table AMs may use the information about if
the query modifies the relation -- which is really what this flag
represents. The ScanOptions are usually either a type or a call to
action. Which is why I felt a bit uncomfortable calling it something
like SO_MODIFIES_REL -- which is less of an option and more a piece of
information. And it makes it sound like the scan modifies the rel,
which is not the case. I wonder if there is another solution. Or maybe
we call it SO_QUERY_MODIFIES_REL?

- Melanie


Attachments:

  [text/x-patch] v19-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patch (12.8K, 2-v19-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patch)
  download | inline diff:
From 338f6e31bf029527a3898fee1fbe587e24de9f5f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 11:10:25 -0400
Subject: [PATCH v19 01/12] Refactor heap_page_prune_and_freeze() parameters
 into a struct
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

heap_page_prune_and_freeze() had accumulated an unwieldy number of input
parameters, and upcoming work to handle VM updates in this function will
add even more.

Introduce a new PruneFreezeParams struct to group the function’s input
parameters, improving readability and maintainability.

Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
---
 src/backend/access/heap/pruneheap.c  | 86 +++++++++++++---------------
 src/backend/access/heap/vacuumlazy.c | 16 ++++--
 src/include/access/heapam.h          | 62 ++++++++++++++++----
 src/tools/pgindent/typedefs.list     |  1 +
 4 files changed, 101 insertions(+), 64 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 231bea679c6..450b2eb6494 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -258,15 +258,23 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
 		{
 			OffsetNumber dummy_off_loc;
+			PruneFreezeParams params;
 			PruneFreezeResult presult;
 
+			params.relation = relation;
+			params.buffer = buffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
+
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+			params.options = 0;
+
+			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -419,60 +427,43 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
- * required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
- *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
  *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
- * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
- * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
+ * it's required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now.  The
+ * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
+ * arguments are required when freezing.  When HEAP_PRUNE_FREEZE option is
+ * passed, we also set presult->all_visible and presult->all_frozen on exit,
+ * to indicate if the VM bits can be set.  They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
+ * callers that also freeze need that information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
  * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set.  On entry, they contain the oldest XID and
- * multi-XID seen on the relation so far.  They will be updated with oldest
- * values present on the page after pruning.  After processing the whole
- * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
- * for the relation.
+ * HEAP_PRUNE_FREEZE option is set in params.  On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far.  They will be updated
+ * with oldest values present on the page after pruning.  After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
  */
 void
-heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-						   GlobalVisState *vistest,
-						   int options,
-						   struct VacuumCutoffs *cutoffs,
+heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   PruneFreezeResult *presult,
-						   PruneReason reason,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
 {
+	Buffer		buffer = params->buffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	OffsetNumber offnum,
@@ -486,10 +477,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
 	/* Copy parameters to prstate */
-	prstate.vistest = vistest;
-	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
-	prstate.cutoffs = cutoffs;
+	prstate.vistest = params->vistest;
+	prstate.mark_unused_now =
+		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.cutoffs = params->cutoffs;
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -583,7 +575,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
 	maxoff = PageGetMaxOffsetNumber(page);
-	tup.t_tableOid = RelationGetRelid(relation);
+	tup.t_tableOid = RelationGetRelid(params->relation);
 
 	/*
 	 * Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -786,7 +778,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
 	 */
-	do_freeze = heap_page_will_freeze(relation, buffer,
+	do_freeze = heap_page_will_freeze(params->relation, buffer,
 									  did_tuple_hint_fpi,
 									  do_prune,
 									  do_hint_prune,
@@ -838,7 +830,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		/*
 		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
 		 */
-		if (RelationNeedsWAL(relation))
+		if (RelationNeedsWAL(params->relation))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -876,11 +868,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			else
 				conflict_xid = prstate.latest_xid_removed;
 
-			log_heap_prune_and_freeze(relation, buffer,
+			log_heap_prune_and_freeze(params->relation, buffer,
 									  InvalidBuffer,	/* vmbuffer */
 									  0,	/* vmflags */
 									  conflict_xid,
-									  true, reason,
+									  true, params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 61fe623cc60..e55be07cae4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1965,10 +1965,16 @@ lazy_scan_prune(LVRelState *vacrel,
 {
 	Relation	rel = vacrel->rel;
 	PruneFreezeResult presult;
-	int			prune_options = 0;
+	PruneFreezeParams params;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
+	params.relation = rel;
+	params.buffer = buf;
+	params.reason = PRUNE_VACUUM_SCAN;
+	params.cutoffs = &vacrel->cutoffs;
+	params.vistest = vacrel->vistest;
+
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
 	 *
@@ -1984,12 +1990,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	params.options = HEAP_PAGE_PRUNE_FREEZE;
 	if (vacrel->nindexes == 0)
-		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
+		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+	heap_page_prune_and_freeze(&params,
+							   &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 909db73b7bb..b0b6d3552a6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -221,6 +221,55 @@ typedef struct HeapPageFreeze
 
 } HeapPageFreeze;
 
+
+/* 'reason' codes for heap_page_prune_and_freeze() */
+typedef enum
+{
+	PRUNE_ON_ACCESS,			/* on-access pruning */
+	PRUNE_VACUUM_SCAN,			/* VACUUM 1st heap pass */
+	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
+} PruneReason;
+
+/*
+ * Input parameters to heap_page_prune_and_freeze()
+ */
+typedef struct PruneFreezeParams
+{
+	Relation	relation;		/* relation containing buffer to be pruned */
+	Buffer		buffer;			/* buffer to be pruned */
+
+	/*
+	 * The reason pruning was performed.  It is used to set the WAL record
+	 * opcode which is used for debugging and analysis purposes.
+	 */
+	PruneReason reason;
+
+	/*
+	 * Contains flag bits:
+	 *
+	 * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+	 * pruning.
+	 *
+	 * FREEZE indicates that we will also freeze tuples, and will return
+	 * 'all_visible', 'all_frozen' flags to the caller.
+	 */
+	int			options;
+
+	/*
+	 * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+	 * (see heap_prune_satisfies_vacuum).
+	 */
+	GlobalVisState *vistest;
+
+	/*
+	 * cutoffs contains the freeze cutoffs, established by VACUUM at the
+	 * beginning of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE
+	 * option is set. cutoffs->OldestXmin is also used to determine if dead
+	 * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+	 */
+	struct VacuumCutoffs *cutoffs;
+} PruneFreezeParams;
+
 /*
  * Per-page state returned by heap_page_prune_and_freeze()
  */
@@ -264,13 +313,6 @@ typedef struct PruneFreezeResult
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 } PruneFreezeResult;
 
-/* 'reason' codes for heap_page_prune_and_freeze() */
-typedef enum
-{
-	PRUNE_ON_ACCESS,			/* on-access pruning */
-	PRUNE_VACUUM_SCAN,			/* VACUUM 1st heap pass */
-	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
-} PruneReason;
 
 /* ----------------
  *		function prototypes for heap access method
@@ -367,12 +409,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-									   GlobalVisState *vistest,
-									   int options,
-									   struct VacuumCutoffs *cutoffs,
+extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
-									   PruneReason reason,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 018b5919cf6..a384171de0d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2343,6 +2343,7 @@ ProjectionPath
 PromptInterruptContext
 ProtocolVersion
 PrsStorage
+PruneFreezeParams
 PruneFreezeResult
 PruneReason
 PruneState
-- 
2.43.0



  [text/x-patch] v19-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patch (5.3K, 3-v19-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patch)
  download | inline diff:
From f179a5493c9671f9c8eca9231292d3a48bf7153c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 14:55:40 -0400
Subject: [PATCH v19 02/12] Keep all_frozen updated in
 heap_page_prune_and_freeze

Previously, we relied on all_visible and all_frozen being used together
to ensure that all_frozen was correct, but it is better to keep both
fields updated.

Future changes will separate their usage, so we should not depend on
all_visible for the validity of all_frozen.
---
 src/backend/access/heap/pruneheap.c  | 22 +++++++++++-----------
 src/backend/access/heap/vacuumlazy.c |  9 ++++-----
 2 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 450b2eb6494..daa719fc2a1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
 	 * whether to freeze the page or not.  The all_visible and all_frozen
 	 * values returned to the caller are adjusted to include LP_DEAD items at
 	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -361,8 +357,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * anymore.  The opportunistic freeze heuristic must be improved;
 		 * however, for now, try to approximate the old logic.
 		 */
-		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		if (prstate->all_frozen && prstate->nfrozen > 0)
 		{
+			Assert(prstate->all_visible);
+
 			/*
 			 * Freezing would make the page all-frozen.  Have already emitted
 			 * an FPI or will do so anyway?
@@ -784,6 +782,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  do_hint_prune,
 									  &prstate);
 
+	Assert(!prstate.all_frozen || prstate.all_visible);
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -853,7 +853,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
+				if (prstate.all_frozen)
 					frz_conflict_horizon = prstate.visibility_cutoff_xid;
 				else
 				{
@@ -1418,7 +1418,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1440,7 +1440,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1453,7 +1453,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1472,7 +1472,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1490,7 +1490,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e55be07cae4..670a7424b15 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * agreement with heap_page_is_all_visible() using an assertion.
 	 */
 #ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
 	if (presult.all_visible)
 	{
 		TransactionId debug_cutoff;
@@ -2075,6 +2074,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
 	Assert(!presult.all_visible || !(*has_lpdead_items));
+	Assert(!presult.all_frozen || presult.all_visible);
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
@@ -2180,11 +2180,10 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
+	 * it as all-frozen.
 	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	else if (all_visible_according_to_vm && presult.all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		uint8		old_vmbits;
 
-- 
2.43.0



  [text/x-patch] v19-0003-Update-PruneState.all_-visible-frozen-earlier-in.patch (9.7K, 4-v19-0003-Update-PruneState.all_-visible-frozen-earlier-in.patch)
  download | inline diff:
From c27ab39e739acf7e96d1f7e81df91fc2b2b7fe43 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:21:49 -0400
Subject: [PATCH v19 03/12] Update PruneState.all_[visible|frozen] earlier in
 pruning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items. This allows opportunistic
freezing if the page would otherwise be fully frozen, since those dead
items are later removed in vacuum’s third phase.

To move the VM update into the same WAL record that
prunes and freezes tuples, we must know whether the page will
be marked all-visible/all-frozen before emitting WAL.

The only barrier to updating these flags immediately after deciding
whether to opportunistically freeze is that we previously used
all_frozen to compute the snapshot conflict horizon when freezing
tuples. By determining the cutoff earlier, we can update the flags
immediately after making the freeze decision.

This is required to set the VM in the XLOG_HEAP2_PRUNE_VACUUM_SCAN
record emitted by pruning and freezing.
---
 src/backend/access/heap/pruneheap.c | 117 ++++++++++++++--------------
 1 file changed, 57 insertions(+), 60 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index daa719fc2a1..ef8861022f1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -138,11 +138,11 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
+	 * That's convenient for heap_page_prune_and_freeze() to use them to
+	 * decide whether to freeze the page or not.  The all_visible and
+	 * all_frozen values returned to the caller are adjusted to include
+	 * LP_DEAD items after we determine whether to opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -175,7 +175,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
-								  PruneState *prstate);
+								  PruneState *prstate, TransactionId *frz_conflict_horizon);
 
 
 /*
@@ -308,7 +308,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * performs several pre-freeze checks.
  *
  * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
- * determined before calling this function.
+ * determined before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to
+ * freeze tuples.
  *
  * prstate is both an input and output parameter.
  *
@@ -320,7 +322,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 					  bool did_tuple_hint_fpi,
 					  bool do_prune,
 					  bool do_hint_prune,
-					  PruneState *prstate)
+					  PruneState *prstate,
+					  TransactionId *frz_conflict_horizon)
 {
 	bool		do_freeze = false;
 
@@ -390,6 +393,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * critical section.
 		 */
 		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+		/*
+		 * Calculate what the snapshot conflict horizon should be for a record
+		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+		 * for conflicts when the whole page is eligible to become all-frozen
+		 * in the VM once we're done with it. Otherwise, we generate a
+		 * conservative cutoff by stepping back from OldestXmin.
+		 */
+		if (prstate->all_frozen)
+			*frz_conflict_horizon = prstate->visibility_cutoff_xid;
+		else
+		{
+			/* Avoids false conflicts when hot_standby_feedback in use */
+			*frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+			TransactionIdRetreat(*frz_conflict_horizon);
+		}
 	}
 	else if (prstate->nfrozen > 0)
 	{
@@ -434,10 +453,11 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * considered advantageous for overall system performance to do so now.  The
  * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
  * arguments are required when freezing.  When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen on exit,
- * to indicate if the VM bits can be set.  They are always set to false when
- * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
- * callers that also freeze need that information.
+ * passed, we also set presult->all_visible and presult->all_frozen after
+ * determining whether or not to opporunistically freeze, to indicate if the
+ * VM bits can be set.  They are always set to false when the
+ * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
+ * that also freeze need that information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -473,6 +493,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_hint_prune;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId frz_conflict_horizon = InvalidTransactionId;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = params->vistest;
@@ -542,10 +563,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * are tuples present that are not visible to everyone or if there are
 	 * dead tuples which are not yet removable.  However, dead tuples which
 	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * opportunistically freezing.  Because of that, we do not immediately
+	 * clear all_visible when we see LP_DEAD items.  We fix that after
+	 * scanning the line pointers, before we return the value to the caller,
+	 * so that the caller doesn't set the VM bit incorrectly.
 	 */
 	if (prstate.attempt_freeze)
 	{
@@ -780,7 +801,24 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  did_tuple_hint_fpi,
 									  do_prune,
 									  do_hint_prune,
-									  &prstate);
+									  &prstate,
+									  &frz_conflict_horizon);
+
+	/*
+	 * While scanning the line pointers, we did not clear
+	 * all_visible/all_frozen when encountering LP_DEAD items because we
+	 * wanted the decision whether or not to freeze the page to be unaffected
+	 * by the short-term presence of LP_DEAD items.  These LP_DEAD items are
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that we finished determining whether or not to freeze the page,
+	 * update all_visible and all_frozen so that they reflect the true state
+	 * of the page for setting PD_ALL_VISIBLE and VM bits.
+	 */
+	if (prstate.lpdead_items > 0)
+		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
 
@@ -842,27 +880,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
 			TransactionId conflict_xid;
 
-			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
-			 */
-			if (do_freeze)
-			{
-				if (prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
-			}
-
 			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
 				conflict_xid = frz_conflict_horizon;
 			else
@@ -888,30 +907,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
+	presult->all_visible = prstate.all_visible;
+	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
 	/*
-- 
2.43.0



  [text/x-patch] v19-0004-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (41.3K, 5-v19-0004-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 02c1c23779aa22e05dde52bb02701398f5261654 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 8 Oct 2025 15:39:01 -0400
Subject: [PATCH v19 04/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c |  37 ++-
 src/backend/access/heap/pruneheap.c   | 435 ++++++++++++++++++++------
 src/backend/access/heap/vacuumlazy.c  | 207 +-----------
 src/include/access/heapam.h           |  43 ++-
 4 files changed, 422 insertions(+), 300 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11cb3f74da5..2af724451c3 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -104,6 +104,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		OffsetNumber *frz_offsets;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 		bool		do_prune;
+		bool		set_lsn = false;
+		bool		mark_buffer_dirty = false;
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
 											   &nplans, &plans, &frz_offsets,
@@ -157,17 +159,36 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
-		if (vmflags & VISIBILITYMAP_VALID_BITS)
-			PageSetAllVisible(page);
-
-		MarkBufferDirty(buffer);
+		if (do_prune || nplans > 0)
+			mark_buffer_dirty = set_lsn = true;
 
 		/*
-		 * See log_heap_prune_and_freeze() for commentary on when we set the
-		 * heap page LSN.
+		 * The critical integrity requirement here is that we must never end
+		 * up with with the visibility map bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear.  If that were to occur, a subsequent page
+		 * modification would fail to clear the visibility map bit.
+		 *
+		 * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
+		 * marking an all-visible page all-frozen). If only the VM is updated,
+		 * the heap page need not be dirtied.
 		 */
-		if (do_prune || nplans > 0 ||
-			((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			mark_buffer_dirty = true;
+
+			/*
+			 * See log_heap_prune_and_freeze() for commentary on when we set
+			 * the heap page LSN.
+			 */
+			if (XLogHintBitIsNeeded())
+				set_lsn = true;
+		}
+
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+
+		if (set_lsn)
 			PageSetLSN(page, lsn);
 
 		/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ef8861022f1..8dafbd344d8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -133,16 +135,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -173,10 +176,21 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid, bool blk_already_av,
+									  bool set_blk_all_frozen);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate, TransactionId *frz_conflict_horizon);
-
+static bool heap_page_will_set_vis(Relation relation,
+								   BlockNumber heap_blk,
+								   Buffer heap_buf,
+								   Buffer vmbuffer,
+								   bool blk_known_av,
+								   const PruneState *prstate,
+								   uint8 *vmflags,
+								   bool *do_set_pd_vis);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -262,6 +276,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			params.reason = PRUNE_ON_ACCESS;
 			params.vistest = vistest;
 			params.cutoffs = NULL;
+			params.vmbuffer = InvalidBuffer;
+			params.blk_known_av = false;
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -434,10 +450,108 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneState and blk_known_av. Some callers may already
+ * have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
+ * should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+					   BlockNumber heap_blk,
+					   Buffer heap_buf,
+					   Buffer vmbuffer,
+					   bool blk_known_av,
+					   const PruneState *prstate,
+					   uint8 *vmflags,
+					   bool *do_set_pd_vis)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+	bool		do_set_vm = false;
+
+	*do_set_pd_vis = false;
+
+	if (!prstate->attempt_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	if (prstate->all_visible && !PageIsAllVisible(heap_page))
+		*do_set_pd_vis = true;
+
+	if ((prstate->all_visible && !blk_known_av) ||
+		(prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+		do_set_vm = true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that in vacuum the bit
+	 * got cleared after heap_vac_scan_next_block() was called, so we must
+	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return do_set_vm;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -452,12 +566,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * it's required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
  * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
- * arguments are required when freezing.  When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen after
- * determining whether or not to opporunistically freeze, to indicate if the
- * VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
- * that also freeze need that information.
+ * arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -482,6 +597,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	OffsetNumber offnum,
@@ -491,15 +607,22 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
+	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	TransactionId frz_conflict_horizon = InvalidTransactionId;
+	TransactionId conflict_xid = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = params->vistest;
 	prstate.mark_unused_now =
 		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
 	prstate.cutoffs = params->cutoffs;
 
 	/*
@@ -546,50 +669,54 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
+	 *
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
+	 *
+	 * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * when we encounter LP_DEAD items. Instead, we correct all_visible after
+	 * deciding whether to freeze, but before updating the VM, to avoid
+	 * setting the VM bit incorrectly.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible when we see LP_DEAD items.  We fix that after
-	 * scanning the line pointers, before we return the value to the caller,
-	 * so that the caller doesn't set the VM bit incorrectly.
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if (prstate.attempt_update_vm)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -821,6 +948,34 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+	 * based on information from the VM and the all_visible/all_frozen flags.
+	 *
+	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+	 * VM bit is clear, we strongly prefer to keep them in sync.
+	 *
+	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+	 * already been set. Setting only the VM is most common when setting an
+	 * already all-visible page all-frozen.
+	 */
+	do_set_vm = heap_page_will_set_vis(params->relation,
+									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   &prstate, &new_vmbits, &do_set_pd_vis);
+
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									prstate.latest_xid_removed, frz_conflict_horizon,
+									prstate.visibility_cutoff_xid, params->blk_known_av,
+									(do_set_vm && (new_vmbits & VISIBILITYMAP_ALL_FROZEN)));
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -842,14 +997,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_pd_vis)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -863,35 +1021,43 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (do_set_pd_vis)
+			PageSetAllVisible(page);
 
-		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
-		 */
-		if (RelationNeedsWAL(params->relation))
+		if (do_prune || do_freeze || do_set_pd_vis)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
+			Assert(PageIsAllVisible(page));
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
+			old_vmbits = visibilitymap_set_vmbits(blockno,
+												  vmbuffer, new_vmbits,
+												  params->relation->rd_locator);
+			if (old_vmbits == new_vmbits)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				/* Unset so we don't emit WAL since no change occurred */
+				do_set_vm = false;
+			}
+		}
 
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+		 * only updating the VM and it turns out it was already set, we will
+		 * have unset do_set_vm earlier. As such, check it again before
+		 * emitting the record.
+		 */
+		if (RelationNeedsWAL(params->relation) &&
+			(do_prune || do_freeze || do_set_vm))
+		{
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  do_set_pd_vis,
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -901,28 +1067,47 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
+
+		if (!heap_page_is_all_visible(params->relation, buffer,
+									  prstate.cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1413,6 +1598,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			{
 				TransactionId xmin;
 
+				Assert(prstate->attempt_update_vm);
+
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = prstate->all_frozen = false;
@@ -2058,6 +2245,65 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 	return nplans;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid, bool blk_already_av,
+				 bool set_blk_all_frozen)
+{
+	TransactionId conflict_xid;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune && !do_freeze &&
+		do_set_vm && blk_already_av && set_blk_all_frozen)
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	return conflict_xid;
+}
+
 /*
  * Write an XLOG_HEAP2_PRUNE* WAL record
  *
@@ -2082,6 +2328,15 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
+ * In some cases, such as when heap_page_prune_and_freeze() is setting an
+ * already marked all-visible page all-frozen, PD_ALL_VISIBLE may already be
+ * set. So, it is possible for vmflags to be non-zero and set_pd_all_vis to be
+ * false.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
@@ -2091,6 +2346,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 						  Buffer vmbuffer, uint8 vmflags,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
+						  bool set_pd_all_vis,
 						  PruneReason reason,
 						  HeapTupleFreeze *frozen, int nfrozen,
 						  OffsetNumber *redirected, int nredirected,
@@ -2127,7 +2383,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 */
 	if (!do_prune &&
 		nfrozen == 0 &&
-		(!do_set_vm || !XLogHintBitIsNeeded()))
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags_heap |= REGBUF_NO_IMAGE;
 
 	/*
@@ -2245,7 +2501,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * See comment at the top of the function about regbuf_flags_heap for
 	 * details on when we can advance the page LSN.
 	 */
-	if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
 	{
 		Assert(BufferIsDirty(buffer));
 		PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 670a7424b15..60529fcf9d7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,13 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   TransactionId OldestXmin,
 										   OffsetNumber *deadoffsets,
@@ -1974,6 +1967,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	params.reason = PRUNE_VACUUM_SCAN;
 	params.cutoffs = &vacrel->cutoffs;
 	params.vistest = vacrel->vistest;
+	params.vmbuffer = vmbuffer;
+	params.blk_known_av = all_visible_according_to_vm;
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1990,7 +1985,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	params.options = HEAP_PAGE_PRUNE_FREEZE;
+	params.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS;
 	if (vacrel->nindexes == 0)
 		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
@@ -2013,33 +2008,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2073,168 +2041,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
 	return presult.ndeleted;
@@ -2956,6 +2782,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  vmflags,
 								  conflict_xid,
 								  false,	/* no cleanup lock required */
+								  (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
@@ -3643,7 +3470,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId OldestXmin,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b0b6d3552a6..1471940b4a4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 *
+	 * vmbuffer is the buffer that must already contain contain the required
+	 * block of the visibility map if we are to update it. blk_known_av is the
+	 * visibility status of the heap block as of the last call to
+	 * find_next_unskippable_block().
+	 */
+	Buffer		vmbuffer;
+	bool		blk_known_av;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -250,8 +261,9 @@ typedef struct PruneFreezeParams
 	 * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
 	 * pruning.
 	 *
-	 * FREEZE indicates that we will also freeze tuples, and will return
-	 * 'all_visible', 'all_frozen' flags to the caller.
+	 * FREEZE indicates that we will also freeze tuples
+	 *
+	 * UPDATE_VIS indicates that we will set the page's status in the VM.
 	 */
 	int			options;
 
@@ -284,19 +296,15 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+	 * we have attempted to update the VM.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
+	uint8		new_vmbits;
+	uint8		old_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -423,6 +431,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 									  Buffer vmbuffer, uint8 vmflags,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
+									  bool set_pd_all_vis,
 									  PruneReason reason,
 									  HeapTupleFreeze *frozen, int nfrozen,
 									  OffsetNumber *redirected, int nredirected,
@@ -433,6 +442,14 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
+
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v19-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.5K, 6-v19-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 90ef5b185e4940f8b6eab291460fe7115d0e7080 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v19 05/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 60529fcf9d7..8c402b5b1d4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1879,9 +1879,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1898,13 +1901,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  true, /* set_pd_all_vis */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v19-0006-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (26.2K, 7-v19-0006-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 25216d74c5ded740fb5b52beb74877c7a701c3b0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v19 06/12] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |  18 ++-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 111 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  28 +---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 54 insertions(+), 378 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 36fee9c994e..a0c5923a563 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8797,50 +8797,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2af724451c3..5ab46e8bf8f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -251,7 +251,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -264,142 +264,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -777,8 +641,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -790,11 +654,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1375,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8dafbd344d8..14690cd62ae 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1031,9 +1031,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		{
 			Assert(PageIsAllVisible(page));
 
-			old_vmbits = visibilitymap_set_vmbits(blockno,
-												  vmbuffer, new_vmbits,
-												  params->relation->rd_locator);
+			old_vmbits = visibilitymap_set(blockno,
+										   vmbuffer, new_vmbits,
+										   params->relation->rd_locator);
 			if (old_vmbits == new_vmbits)
 			{
 				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
@@ -2309,14 +2309,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ *   all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8c402b5b1d4..ff6f0d1d0af 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1901,11 +1901,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2787,9 +2787,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 * set PD_ALL_VISIBLE.
 		 */
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 2f5e61e2392..a75b5bb6b13 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
 
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
@@ -344,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..e9e77bd678b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a384171de0d..6b4a40f616c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4281,7 +4281,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v19-0007-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (8.2K, 8-v19-0007-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From 44e0d30a3672a28187e0fb1da014f05747c00d29 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v19 07/12] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.

The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 16 ++++++++--------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 17 ++++++++---------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 14690cd62ae..d03b754b2cc 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -235,7 +235,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -730,9 +730,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1157,11 +1157,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1618,7 +1618,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				/*
 				 * For now always use prstate->cutoffs for this test, because
 				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
+				 * could use GlobalVisXidVisibleToAll() instead, if a
 				 * non-freezing caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 71ef2e5036f..1c0eb425ee9 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 }
 
 /*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
  *
  * It is crucial that this only gets called for xids from a source that
  * protects against xid wraparounds (e.g. from a table and thus protected by
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
  */
 bool
 GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v19-0008-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (10.5K, 9-v19-0008-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 7e8926fc6256dc0966b0c65e4fcec0031fbd2988 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v19 08/12] Use GlobalVisState in vacuum to determine page
 level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++++
 src/backend/access/heap/pruneheap.c         | 37 ++++++++++-----------
 src/backend/access/heap/vacuumlazy.c        | 17 +++++-----
 src/include/access/heapam.h                 |  7 ++--
 4 files changed, 57 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d03b754b2cc..d3c57eedfe3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -712,11 +712,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -912,6 +913,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update the hint
@@ -1084,10 +1095,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		if (!heap_page_is_all_visible(params->relation, buffer,
-									  prstate.cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1615,19 +1625,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisXidVisibleToAll() instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ff6f0d1d0af..5e3c1d50378 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,7 +465,7 @@ static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2741,7 +2741,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3496,14 +3496,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3524,7 +3523,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3543,7 +3542,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3617,7 +3616,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 		/* Visibility checks may do IO or allocate memory */
 		Assert(CritSectionCount == 0);
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3636,7 +3635,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 1471940b4a4..4fc6edf4261 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -276,8 +276,7 @@ typedef struct PruneFreezeParams
 	/*
 	 * cutoffs contains the freeze cutoffs, established by VACUUM at the
 	 * beginning of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE
-	 * option is set. cutoffs->OldestXmin is also used to determine if dead
-	 * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+	 * option is set.
 	 */
 	struct VacuumCutoffs *cutoffs;
 } PruneFreezeParams;
@@ -444,7 +443,7 @@ extern void heap_vacuum_rel(Relation rel,
 
 #ifdef USE_ASSERT_CHECKING
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -457,6 +456,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v19-0009-Unset-all_visible-sooner-if-not-freezing.patch (2.4K, 10-v19-0009-Unset-all_visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 675c8aa69fd456dfee011d40c913f33cd866cab6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v19 09/12] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
---
 src/backend/access/heap/pruneheap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d3c57eedfe3..7f457abf8e1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1483,8 +1483,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1739,8 +1742,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v19-0010-Allow-on-access-pruning-to-set-pages-all-visible.patch (28.3K, 11-v19-0010-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 85792e88a836a7909a27783b1801e1da0e51399e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v19 10/12] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 15 +++-
 src/backend/access/heap/heapam_handler.c      | 15 +++-
 src/backend/access/heap/pruneheap.c           | 73 +++++++++++++++----
 src/backend/access/index/indexam.c            | 46 ++++++++++++
 src/backend/access/table/tableam.c            | 39 +++++++++-
 src/backend/executor/execMain.c               |  4 +
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  7 +-
 src/backend/executor/nodeIndexscan.c          | 18 +++--
 src/backend/executor/nodeSeqscan.c            | 24 ++++--
 src/include/access/genam.h                    | 11 +++
 src/include/access/heapam.h                   | 24 +++++-
 src/include/access/relscan.h                  |  6 ++
 src/include/access/tableam.h                  | 30 +++++++-
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  3 +-
 16 files changed, 284 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a0c5923a563..260f981d457 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7f457abf8e1..631eb45bc96 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -188,7 +188,9 @@ static bool heap_page_will_set_vis(Relation relation,
 								   Buffer heap_buf,
 								   Buffer vmbuffer,
 								   bool blk_known_av,
-								   const PruneState *prstate,
+								   PruneReason reason,
+								   bool do_prune, bool do_freeze,
+								   PruneState *prstate,
 								   uint8 *vmflags,
 								   bool *do_set_pd_vis);
 
@@ -203,9 +205,13 @@ static bool heap_page_will_set_vis(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -271,12 +277,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeParams params;
 			PruneFreezeResult presult;
 
+			params.options = 0;
+			params.vmbuffer = InvalidBuffer;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			params.relation = relation;
 			params.buffer = buffer;
 			params.reason = PRUNE_ON_ACCESS;
 			params.vistest = vistest;
 			params.cutoffs = NULL;
-			params.vmbuffer = InvalidBuffer;
 			params.blk_known_av = false;
 
 			/*
@@ -456,6 +471,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * have examined this page’s VM bits (e.g., VACUUM in the previous
  * heap_vac_scan_next_block() call) and can pass that along.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with the desired
  * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
  * should be set on the heap page.
@@ -466,7 +484,9 @@ heap_page_will_set_vis(Relation relation,
 					   Buffer heap_buf,
 					   Buffer vmbuffer,
 					   bool blk_known_av,
-					   const PruneState *prstate,
+					   PruneReason reason,
+					   bool do_prune, bool do_freeze,
+					   PruneState *prstate,
 					   uint8 *vmflags,
 					   bool *do_set_pd_vis)
 {
@@ -482,6 +502,23 @@ heap_page_will_set_vis(Relation relation,
 		return false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+	{
+		prstate->all_visible = prstate->all_frozen = false;
+		return false;
+	}
+
 	if (prstate->all_visible && !PageIsAllVisible(heap_page))
 		*do_set_pd_vis = true;
 
@@ -505,6 +542,11 @@ heap_page_will_set_vis(Relation relation,
 	 * page-level bit is clear.  However, it's possible that in vacuum the bit
 	 * got cleared after heap_vac_scan_next_block() was called, so we must
 	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 *
+	 * This will never trigger for on-access pruning because it couldn't have
+	 * done a previous visibility map lookup and thus will always pass
+	 * blk_known_av as false. A future vacuum will have to take care of fixing
+	 * the corruption.
 	 */
 	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
 			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -913,6 +955,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * Even if we don't prune anything, if we found a new value for the
+	 * pd_prune_xid field or the page was marked full, we will update the hint
+	 * bit.
+	 */
+	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+		PageIsFull(page);
+
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
@@ -923,14 +973,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
-	/*
-	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
-	 */
-	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
-		PageIsFull(page);
-
 	/*
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
@@ -974,6 +1016,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	do_set_vm = heap_page_will_set_vis(params->relation,
 									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   params->reason, do_prune, do_freeze,
 									   &prstate, &new_vmbits, &do_set_pd_vis);
 
 	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -2250,7 +2293,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 
 /*
  * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
- * record.
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
  */
 static TransactionId
 get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
@@ -2320,8 +2363,8 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
  * all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..8d582a8eafd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 5e41404937e..3e3a0f72a71 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..0630a5af79e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f36929deec3..cbd1ecaa15f 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..aa2112c8e04 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -178,6 +178,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -204,6 +209,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4fc6edf4261..1d2cab64e9c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -415,7 +432,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..f250d4e7aec 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -882,6 +884,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -919,10 +940,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..0c3b0d60168 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v19-0011-Set-pd_prune_xid-on-insert.patch (6.7K, 12-v19-0011-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From e6b5d40c8c8758179ccd2ffe6e60dfc725430c12 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v19 11/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 260f981d457..eea3a3d2ddc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5ab46e8bf8f..dac640f5c9d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -462,6 +462,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -611,9 +617,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



  [text/x-patch] v19-0012-Split-heap_page_prune_and_freeze-into-helpers.patch (17.8K, 13-v19-0012-Split-heap_page_prune_and_freeze-into-helpers.patch)
  download | inline diff:
From 1ff6d727d64771ed19e07e6d1644380e16508944 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 8 Oct 2025 18:45:45 -0400
Subject: [PATCH v19 12/12] Split heap_page_prune_and_freeze into helpers

ci-os-only:
---
 src/backend/access/heap/pruneheap.c | 316 +++++++++++++++-------------
 1 file changed, 170 insertions(+), 146 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 631eb45bc96..e18ec37fdf5 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -590,82 +590,20 @@ heap_page_will_set_vis(Relation relation,
 	return do_set_vm;
 }
 
-/*
- * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page. If the page's visibility status has changed, update it in
- * the VM.
- *
- * Caller must have pin and buffer cleanup lock on the page.  Note that we
- * don't update the FSM information for page on caller's behalf.  Caller might
- * also need to account for a reduction in the length of the line pointer
- * array following array truncation by us.
- *
- * params contains the input parameters used to control freezing and pruning
- * behavior. See the definition of PruneFreezeParams for more on what each
- * parameter does.
- *
- * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
- * it's required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now.  The
- * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
- * arguments are required when freezing.
- *
- * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
- * the page has changed, we will update the VM at the same time as pruning and
- * freezing the heap page. We will also update presult->old_vmbits and
- * presult->new_vmbits with the state of the VM before and after updating it
- * for the caller to use in bookkeeping.
- *
- * presult contains output parameters needed by callers, such as the number of
- * tuples removed and the offsets of dead items on the page after pruning.
- * heap_page_prune_and_freeze() is responsible for initializing it.  Required
- * by all callers.
- *
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
- * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set in params.  On entry, they contain the
- * oldest XID and multi-XID seen on the relation so far.  They will be updated
- * with oldest values present on the page after pruning.  After processing the
- * whole relation, VACUUM can use these values as the new
- * relfrozenxid/relminmxid for the relation.
- */
-void
-heap_page_prune_and_freeze(PruneFreezeParams *params,
-						   PruneFreezeResult *presult,
-						   OffsetNumber *off_loc,
-						   TransactionId *new_relfrozen_xid,
-						   MultiXactId *new_relmin_mxid)
+static void
+prune_freeze_setup(PruneFreezeParams *params, PruneState *prstate,
+				   TransactionId *new_relfrozen_xid,
+				   MultiXactId *new_relmin_mxid,
+				   PruneFreezeResult *presult)
 {
-	Buffer		buffer = params->buffer;
-	Buffer		vmbuffer = params->vmbuffer;
-	Page		page = BufferGetPage(buffer);
-	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	OffsetNumber offnum,
-				maxoff;
-	PruneState	prstate;
-	HeapTupleData tup;
-	bool		do_freeze;
-	bool		do_prune;
-	bool		do_hint_prune;
-	bool		do_set_vm;
-	bool		do_set_pd_vis;
-	bool		did_tuple_hint_fpi;
-	int64		fpi_before = pgWalUsage.wal_fpi;
-	TransactionId frz_conflict_horizon = InvalidTransactionId;
-	TransactionId conflict_xid = InvalidTransactionId;
-	uint8		new_vmbits = 0;
-	uint8		old_vmbits = 0;
-
 	/* Copy parameters to prstate */
-	prstate.vistest = params->vistest;
-	prstate.mark_unused_now =
+	prstate->vistest = params->vistest;
+	prstate->mark_unused_now =
 		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
-	prstate.attempt_update_vm =
+	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
 		(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
-	prstate.cutoffs = params->cutoffs;
+	prstate->cutoffs = params->cutoffs;
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -678,37 +616,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * prunable, we will save the lowest relevant XID in new_prune_xid. Also
 	 * initialize the rest of our working state.
 	 */
-	prstate.new_prune_xid = InvalidTransactionId;
-	prstate.latest_xid_removed = InvalidTransactionId;
-	prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
-	prstate.nroot_items = 0;
-	prstate.nheaponly_items = 0;
+	prstate->new_prune_xid = InvalidTransactionId;
+	prstate->latest_xid_removed = InvalidTransactionId;
+	prstate->nredirected = prstate->ndead = prstate->nunused = prstate->nfrozen = 0;
+	prstate->nroot_items = 0;
+	prstate->nheaponly_items = 0;
 
 	/* initialize page freezing working state */
-	prstate.pagefrz.freeze_required = false;
-	if (prstate.attempt_freeze)
+	prstate->pagefrz.freeze_required = false;
+	if (prstate->attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
-		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
-		prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
-		prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
-		prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+		prstate->pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
+		prstate->pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
+		prstate->pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
+		prstate->pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
 	}
 	else
 	{
 		Assert(new_relfrozen_xid == NULL && new_relmin_mxid == NULL);
-		prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
-		prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
-		prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
-		prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+		prstate->pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+		prstate->pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+		prstate->pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+		prstate->pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
 	}
 
-	prstate.ndeleted = 0;
-	prstate.live_tuples = 0;
-	prstate.recently_dead_tuples = 0;
-	prstate.hastup = false;
-	prstate.lpdead_items = 0;
-	prstate.deadoffsets = presult->deadoffsets;
+	prstate->ndeleted = 0;
+	prstate->live_tuples = 0;
+	prstate->recently_dead_tuples = 0;
+	prstate->hastup = false;
+	prstate->lpdead_items = 0;
+	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
 	 * Track whether the page could be marked all-visible and/or all-frozen.
@@ -736,20 +674,20 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * bookkeeping. In this case, initializing all_visible to false allows
 	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
-	if (prstate.attempt_freeze)
+	if (prstate->attempt_freeze)
 	{
-		prstate.all_visible = true;
-		prstate.all_frozen = true;
+		prstate->all_visible = true;
+		prstate->all_frozen = true;
 	}
-	else if (prstate.attempt_update_vm)
+	else if (prstate->attempt_update_vm)
 	{
-		prstate.all_visible = true;
-		prstate.all_frozen = false;
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
 	}
 	else
 	{
-		prstate.all_visible = false;
-		prstate.all_frozen = false;
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
 	}
 
 	/*
@@ -761,10 +699,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * used to calculate the snapshot conflict horizon when updating the VM
 	 * and/or freezing all the tuples on the page.
 	 */
-	prstate.visibility_cutoff_xid = InvalidTransactionId;
+	prstate->visibility_cutoff_xid = InvalidTransactionId;
+}
 
-	maxoff = PageGetMaxOffsetNumber(page);
-	tup.t_tableOid = RelationGetRelid(params->relation);
+static void
+prune_freeze_plan(PruneState *prstate, BlockNumber blockno, Buffer buffer, Page page,
+				  OffsetNumber maxoff, OffsetNumber *off_loc, HeapTuple tup)
+{
+	OffsetNumber offnum;
 
 	/*
 	 * Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -799,13 +741,13 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		*off_loc = offnum;
 
-		prstate.processed[offnum] = false;
-		prstate.htsv[offnum] = -1;
+		prstate->processed[offnum] = false;
+		prstate->htsv[offnum] = -1;
 
 		/* Nothing to do if slot doesn't contain a tuple */
 		if (!ItemIdIsUsed(itemid))
 		{
-			heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
+			heap_prune_record_unchanged_lp_unused(page, prstate, offnum);
 			continue;
 		}
 
@@ -815,17 +757,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * If the caller set mark_unused_now true, we can set dead line
 			 * pointers LP_UNUSED now.
 			 */
-			if (unlikely(prstate.mark_unused_now))
-				heap_prune_record_unused(&prstate, offnum, false);
+			if (unlikely(prstate->mark_unused_now))
+				heap_prune_record_unused(prstate, offnum, false);
 			else
-				heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
+				heap_prune_record_unchanged_lp_dead(page, prstate, offnum);
 			continue;
 		}
 
 		if (ItemIdIsRedirected(itemid))
 		{
 			/* This is the start of a HOT chain */
-			prstate.root_items[prstate.nroot_items++] = offnum;
+			prstate->root_items[prstate->nroot_items++] = offnum;
 			continue;
 		}
 
@@ -835,25 +777,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * Get the tuple's visibility status and queue it up for processing.
 		 */
 		htup = (HeapTupleHeader) PageGetItem(page, itemid);
-		tup.t_data = htup;
-		tup.t_len = ItemIdGetLength(itemid);
-		ItemPointerSet(&tup.t_self, blockno, offnum);
+		tup->t_data = htup;
+		tup->t_len = ItemIdGetLength(itemid);
+		ItemPointerSet(&tup->t_self, blockno, offnum);
 
-		prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
-														   buffer);
+		prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, tup,
+															buffer);
 
 		if (!HeapTupleHeaderIsHeapOnly(htup))
-			prstate.root_items[prstate.nroot_items++] = offnum;
+			prstate->root_items[prstate->nroot_items++] = offnum;
 		else
-			prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
+			prstate->heaponly_items[prstate->nheaponly_items++] = offnum;
 	}
 
-	/*
-	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
-	 * an FPI to be emitted.
-	 */
-	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
-
 	/*
 	 * Process HOT chains.
 	 *
@@ -865,30 +801,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * the page instead of using the root_items array, also did it in
 	 * ascending offset number order.)
 	 */
-	for (int i = prstate.nroot_items - 1; i >= 0; i--)
+	for (int i = prstate->nroot_items - 1; i >= 0; i--)
 	{
-		offnum = prstate.root_items[i];
+		offnum = prstate->root_items[i];
 
 		/* Ignore items already processed as part of an earlier chain */
-		if (prstate.processed[offnum])
+		if (prstate->processed[offnum])
 			continue;
 
 		/* see preceding loop */
 		*off_loc = offnum;
 
 		/* Process this item or chain of items */
-		heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
+		heap_prune_chain(page, blockno, maxoff, offnum, prstate);
 	}
 
 	/*
 	 * Process any heap-only tuples that were not already processed as part of
 	 * a HOT chain.
 	 */
-	for (int i = prstate.nheaponly_items - 1; i >= 0; i--)
+	for (int i = prstate->nheaponly_items - 1; i >= 0; i--)
 	{
-		offnum = prstate.heaponly_items[i];
+		offnum = prstate->heaponly_items[i];
 
-		if (prstate.processed[offnum])
+		if (prstate->processed[offnum])
 			continue;
 
 		/* see preceding loop */
@@ -907,7 +843,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * return true for an XMIN_INVALID tuple, so this code will work even
 		 * when there were sequential updates within the aborted transaction.)
 		 */
-		if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
+		if (prstate->htsv[offnum] == HEAPTUPLE_DEAD)
 		{
 			ItemId		itemid = PageGetItemId(page, offnum);
 			HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -915,8 +851,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
 			{
 				HeapTupleHeaderAdvanceConflictHorizon(htup,
-													  &prstate.latest_xid_removed);
-				heap_prune_record_unused(&prstate, offnum, true);
+													  &prstate->latest_xid_removed);
+				heap_prune_record_unused(prstate, offnum, true);
 			}
 			else
 			{
@@ -933,7 +869,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			}
 		}
 		else
-			heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
+			heap_prune_record_unchanged_lp_normal(page, prstate, offnum);
 	}
 
 	/* We should now have processed every tuple exactly once  */
@@ -944,12 +880,110 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	{
 		*off_loc = offnum;
 
-		Assert(prstate.processed[offnum]);
+		Assert(prstate->processed[offnum]);
 	}
 #endif
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate->all_visible &&
+		TransactionIdIsNormal(prstate->visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate->vistest, prstate->visibility_cutoff_xid))
+		prstate->all_visible = prstate->all_frozen = false;
+
 	/* Clear the offset information once we have processed the given page. */
 	*off_loc = InvalidOffsetNumber;
+}
+
+/*
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
+ *
+ * Caller must have pin and buffer cleanup lock on the page.  Note that we
+ * don't update the FSM information for page on caller's behalf.  Caller might
+ * also need to account for a reduction in the length of the line pointer
+ * array following array truncation by us.
+ *
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
+ * it's required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now.  The
+ * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
+ * arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
+ *
+ * presult contains output parameters needed by callers, such as the number of
+ * tuples removed and the offsets of dead items on the page after pruning.
+ * heap_page_prune_and_freeze() is responsible for initializing it.  Required
+ * by all callers.
+ *
+ * off_loc is the offset location required by the caller to use in error
+ * callback.
+ *
+ * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
+ * HEAP_PRUNE_FREEZE option is set in params.  On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far.  They will be updated
+ * with oldest values present on the page after pruning.  After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
+ */
+void
+heap_page_prune_and_freeze(PruneFreezeParams *params,
+						   PruneFreezeResult *presult,
+						   OffsetNumber *off_loc,
+						   TransactionId *new_relfrozen_xid,
+						   MultiXactId *new_relmin_mxid)
+{
+	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
+	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	OffsetNumber maxoff;
+	PruneState	prstate;
+	HeapTupleData tup;
+	bool		do_freeze;
+	bool		do_prune;
+	bool		do_hint_prune;
+	bool		do_set_vm;
+	bool		do_set_pd_vis;
+	bool		did_tuple_hint_fpi;
+	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId frz_conflict_horizon = InvalidTransactionId;
+	TransactionId conflict_xid = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
+
+	maxoff = PageGetMaxOffsetNumber(page);
+	tup.t_tableOid = RelationGetRelid(params->relation);
+
+	/* Initialize needed state in prstate */
+	prune_freeze_setup(params, &prstate, new_relfrozen_xid, new_relmin_mxid, presult);
+
+	/*
+	 * Examine all line pointers and tuple visibility information to determine
+	 * which line pointers should change state and which tuples may be frozen.
+	 * Prepare queue of state changes to later be executed in a critical
+	 * section.
+	 */
+	prune_freeze_plan(&prstate, blockno, buffer, page, maxoff, off_loc, &tup);
+
+	/*
+	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+	 * an FPI to be emitted.
+	 */
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	do_prune = prstate.nredirected > 0 ||
 		prstate.ndead > 0 ||
@@ -963,16 +997,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
-	/*
-	 * After processing all the live tuples on the page, if the newest xmin
-	 * amongst them is not visible to everyone, the page cannot be
-	 * all-visible.
-	 */
-	if (prstate.all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
-		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
-		prstate.all_visible = prstate.all_frozen = false;
-
 	/*
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-11-17 23:07  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-11-17 23:07 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Attached v20 has general cleanup, changes to the table/index AM
callbacks detailed below, and it moves the
heap_page_prune_and_freeze() refactoring commit down the stack to
0004.

0001 - 0003 are fairly trivial cleanup patches. I think they are ready
to commit, so if I don't hear any objections in the next few days,
I'll go ahead and commit them.

On Tue, Nov 4, 2025 at 11:48 AM Melanie Plageman
<[email protected]> wrote:
>
> On Wed, Oct 29, 2025 at 7:03 AM Kirill Reshke <[email protected]> wrote:
> >
> > In 0010:
> >
> > I'm not terribly convenient that adding SO_ALLOW_VM_SET to TAM
> > ScanOptions is the right thing to do. Looks like VM bits are something
> > that make sense for HEAP AM for not for any TAM. So, don't we break
> > some layer of abstraction here? Would it be better for HEAP AM to set
> > some flags in heap_beginscan?
>
> I don't see another good way of doing it.
>
> The information about whether or not the relation is modified in the
> query is gathered during planning and saved in the plan. We need to
> get that information to the scan descriptor, which is all we have when
> we call heap_page_prune_opt() during the scan. The scan descriptor is
> created by the table AM implementations of scan_begin(). The table AM
> callbacks don't pass down the plan -- which makes sense; the scan
> shouldn't know about the plan. They do pass down flags, so I thought
> it made the most sense to add a flag. Note that I was able to avoid
> modifying the actual table and index AM callbacks (scan_begin() and
> ambeginscan()). I only made new wrappers that took "modifies_rel".
>
> Now, it is true that referring to the VM is somewhat of a layering
> violation. Though, other table AMs may use the information about if
> the query modifies the relation -- which is really what this flag
> represents. The ScanOptions are usually either a type or a call to
> action. Which is why I felt a bit uncomfortable calling it something
> like SO_MODIFIES_REL -- which is less of an option and more a piece of
> information. And it makes it sound like the scan modifies the rel,
> which is not the case. I wonder if there is another solution. Or maybe
> we call it SO_QUERY_MODIFIES_REL?

Attached v20 changes the ScanOption name to SO_HINT_REL_READ_ONLY and
removes the new helper functions which took modifies_rel as a
parameter. Instead it modifies the existing
table_beginscan()/index_beginscan() helpers and the relevant callbacks
they invoke to have a new flags parameter. These are additional caller
provider flags.

In master, the IndexScan structures and helpers don't use ScanOptions,
but since I'm using them for properties of the base relation, I think
it is fine. I'm not sure if I should name the parameter base_rel_flags
instead of flags for the index-related callbacks and helpers or if
leaving it more generic is better, though.

- Melanie


Attachments:

  [text/x-patch] v20-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patch (13.0K, 2-v20-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patch)
  download | inline diff:
From 12ecb9ef685f2fed0d741f91d6fc6a6a9f959c80 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 11:10:25 -0400
Subject: [PATCH v20 01/12] Refactor heap_page_prune_and_freeze() parameters
 into a struct
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

heap_page_prune_and_freeze() had accumulated an unwieldy number of input
parameters and upcoming work to handle VM updates in this function will
add even more.

Introduce a new PruneFreezeParams struct to group the function’s input
parameters, improving readability and maintainability.

Author: Melanie Plageman <[email protected]>
Suggested-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
---
 src/backend/access/heap/pruneheap.c  | 91 +++++++++++++---------------
 src/backend/access/heap/vacuumlazy.c | 12 ++--
 src/include/access/heapam.h          | 63 +++++++++++++++----
 src/tools/pgindent/typedefs.list     |  1 +
 4 files changed, 100 insertions(+), 67 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 231bea679c6..e9e14cb42b7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -261,12 +261,18 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeResult presult;
 
 			/*
-			 * For now, pass mark_unused_now as false regardless of whether or
-			 * not the relation has indexes, since we cannot safely determine
-			 * that during on-access pruning with the current implementation.
+			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
+			 * regardless of whether or not the relation has indexes, since we
+			 * cannot safely determine that during on-access pruning with the
+			 * current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+			PruneFreezeParams params = {.relation = relation,.buffer = buffer,
+				.reason = PRUNE_ON_ACCESS,.options = 0,
+				.vistest = vistest,.cutoffs = NULL
+			};
+
+			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
+									   NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -419,60 +425,44 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
- * required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
- *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
  *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
- * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
- * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
+ * tuples if it's required in order to advance relfrozenxid / relminmxid, or
+ * if it's considered advantageous for overall system performance to do so
+ * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
+ * 'new_relmin_mxid' arguments are required when freezing.  When
+ * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
+ * and presult->all_frozen on exit, to indicate if the VM bits can be set.
+ * They are always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not
+ * passed, because at the moment only callers that also freeze need that
+ * information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
  * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set.  On entry, they contain the oldest XID and
- * multi-XID seen on the relation so far.  They will be updated with oldest
- * values present on the page after pruning.  After processing the whole
- * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
- * for the relation.
+ * HEAP_PAGE_PRUNE_FREEZE option is set in params.  On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far.  They will be updated
+ * with oldest values present on the page after pruning.  After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
  */
 void
-heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-						   GlobalVisState *vistest,
-						   int options,
-						   struct VacuumCutoffs *cutoffs,
+heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   PruneFreezeResult *presult,
-						   PruneReason reason,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
 {
+	Buffer		buffer = params->buffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	OffsetNumber offnum,
@@ -486,10 +476,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
 	/* Copy parameters to prstate */
-	prstate.vistest = vistest;
-	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
-	prstate.cutoffs = cutoffs;
+	prstate.vistest = params->vistest;
+	prstate.mark_unused_now =
+		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.cutoffs = params->cutoffs;
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -583,7 +574,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
 	maxoff = PageGetMaxOffsetNumber(page);
-	tup.t_tableOid = RelationGetRelid(relation);
+	tup.t_tableOid = RelationGetRelid(params->relation);
 
 	/*
 	 * Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -786,7 +777,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
 	 */
-	do_freeze = heap_page_will_freeze(relation, buffer,
+	do_freeze = heap_page_will_freeze(params->relation, buffer,
 									  did_tuple_hint_fpi,
 									  do_prune,
 									  do_hint_prune,
@@ -838,7 +829,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		/*
 		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
 		 */
-		if (RelationNeedsWAL(relation))
+		if (RelationNeedsWAL(params->relation))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -876,11 +867,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			else
 				conflict_xid = prstate.latest_xid_removed;
 
-			log_heap_prune_and_freeze(relation, buffer,
+			log_heap_prune_and_freeze(params->relation, buffer,
 									  InvalidBuffer,	/* vmbuffer */
 									  0,	/* vmflags */
 									  conflict_xid,
-									  true, reason,
+									  true, params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index deb9a3dc0d1..2b9e5c7f81b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1965,7 +1965,10 @@ lazy_scan_prune(LVRelState *vacrel,
 {
 	Relation	rel = vacrel->rel;
 	PruneFreezeResult presult;
-	int			prune_options = 0;
+	PruneFreezeParams params = {.relation = rel,.buffer = buf,
+		.reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE,
+		.cutoffs = &vacrel->cutoffs,.vistest = vacrel->vistest
+	};
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -1984,12 +1987,11 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
 	if (vacrel->nindexes == 0)
-		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
+		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+	heap_page_prune_and_freeze(&params,
+							   &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 909db73b7bb..632c4332a8c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -221,6 +221,56 @@ typedef struct HeapPageFreeze
 
 } HeapPageFreeze;
 
+
+/* 'reason' codes for heap_page_prune_and_freeze() */
+typedef enum
+{
+	PRUNE_ON_ACCESS,			/* on-access pruning */
+	PRUNE_VACUUM_SCAN,			/* VACUUM 1st heap pass */
+	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
+} PruneReason;
+
+/*
+ * Input parameters to heap_page_prune_and_freeze()
+ */
+typedef struct PruneFreezeParams
+{
+	Relation	relation;		/* relation containing buffer to be pruned */
+	Buffer		buffer;			/* buffer to be pruned */
+
+	/*
+	 * The reason pruning was performed.  It is used to set the WAL record
+	 * opcode which is used for debugging and analysis purposes.
+	 */
+	PruneReason reason;
+
+	/*
+	 * Contains flag bits:
+	 *
+	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
+	 * LP_UNUSED during pruning.
+	 *
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
+	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 */
+	int			options;
+
+	/*
+	 * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+	 * (see heap_prune_satisfies_vacuum).
+	 */
+	GlobalVisState *vistest;
+
+	/*
+	 * Contains the cutoffs used for freezing. They are required if the
+	 * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
+	 * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
+	 * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
+	 * calculates them once, at the beginning of vacuuming the relation.
+	 */
+	struct VacuumCutoffs *cutoffs;
+} PruneFreezeParams;
+
 /*
  * Per-page state returned by heap_page_prune_and_freeze()
  */
@@ -264,13 +314,6 @@ typedef struct PruneFreezeResult
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 } PruneFreezeResult;
 
-/* 'reason' codes for heap_page_prune_and_freeze() */
-typedef enum
-{
-	PRUNE_ON_ACCESS,			/* on-access pruning */
-	PRUNE_VACUUM_SCAN,			/* VACUUM 1st heap pass */
-	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
-} PruneReason;
 
 /* ----------------
  *		function prototypes for heap access method
@@ -367,12 +410,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-									   GlobalVisState *vistest,
-									   int options,
-									   struct VacuumCutoffs *cutoffs,
+extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
-									   PruneReason reason,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 23bce72ae64..8698918f443 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2345,6 +2345,7 @@ ProjectionPath
 PromptInterruptContext
 ProtocolVersion
 PrsStorage
+PruneFreezeParams
 PruneFreezeResult
 PruneReason
 PruneState
-- 
2.43.0



  [text/x-patch] v20-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patch (9.1K, 3-v20-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patch)
  download | inline diff:
From 55975c548eb8fa66be84fa8c1f41ee723549814b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 14:55:40 -0400
Subject: [PATCH v20 02/12] Keep all_frozen updated in
 heap_page_prune_and_freeze

Previously, we relied on all_visible and all_frozen being used together
to ensure that all_frozen was correct, but it is better to keep both
fields updated.

Future changes will separate their usage, so we should not depend on
all_visible for the validity of all_frozen.

Author: Melanie Plageman <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 60 +++++++++++++++-------------
 src/backend/access/heap/vacuumlazy.c |  9 ++---
 2 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e9e14cb42b7..7cd51c7be33 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
 	 * whether to freeze the page or not.  The all_visible and all_frozen
 	 * values returned to the caller are adjusted to include LP_DEAD items at
 	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -359,8 +355,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * anymore.  The opportunistic freeze heuristic must be improved;
 		 * however, for now, try to approximate the old logic.
 		 */
-		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		if (prstate->all_frozen && prstate->nfrozen > 0)
 		{
+			Assert(prstate->all_visible);
+
 			/*
 			 * Freezing would make the page all-frozen.  Have already emitted
 			 * an FPI or will do so anyway?
@@ -544,9 +542,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * dead tuples which are not yet removable.  However, dead tuples which
 	 * will be removed by the end of vacuuming should not preclude us from
 	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * all_visible and all_frozen when we see LP_DEAD items.  We fix that at
+	 * the end of the function, when we return the value to the caller, so
+	 * that the caller doesn't set the VM bits incorrectly.
 	 */
 	if (prstate.attempt_freeze)
 	{
@@ -783,6 +781,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  do_hint_prune,
 									  &prstate);
 
+	Assert(!prstate.all_frozen || prstate.all_visible);
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -852,7 +852,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
+				if (prstate.all_frozen)
 					frz_conflict_horizon = prstate.visibility_cutoff_xid;
 				else
 				{
@@ -889,16 +889,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
 
 	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 * It was convenient to ignore LP_DEAD items in all_visible/all_frozen
+	 * earlier on to make the choice of whether or not to freeze the page
+	 * unaffected by the short-term presence of LP_DEAD items.  These LP_DEAD
+	 * items were effectively assumed to be LP_UNUSED items in the making.  It
+	 * doesn't matter which vacuum heap pass (initial pass or final pass) ends
+	 * up setting the page all-frozen, as long as the ongoing VACUUM does it.
 	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
+	 * Now that freezing has been finalized, unset all_visible and all_frozen
+	 * if there are any LP_DEAD items on the page.  It needs to reflect the
+	 * present state of the page, as expected by our caller.
 	 */
 	if (prstate.all_visible && prstate.lpdead_items == 0)
 	{
@@ -1289,8 +1289,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	prstate->ndead++;
 
 	/*
-	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Deliberately delay unsetting all_visible and all_frozen until later
+	 * during pruning. Removable dead tuples shouldn't preclude freezing the
+	 * page.
 	 */
 
 	/* Record the dead offset for vacuum */
@@ -1418,6 +1419,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = false;
+					prstate->all_frozen = false;
 					break;
 				}
 
@@ -1432,14 +1434,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				/*
 				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
-				 * non-freezing caller wanted to set the VM bit.
+				 * we only update 'all_visible' and 'all_frozen' when freezing
+				 * is requested. We could use GlobalVisTestIsRemovableXid
+				 * instead, if a non-freezing caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
 					prstate->all_visible = false;
+					prstate->all_frozen = false;
 					break;
 				}
 
@@ -1453,6 +1456,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
 			prstate->all_visible = false;
+			prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1472,6 +1476,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * does, so be consistent.
 			 */
 			prstate->all_visible = false;
+			prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1490,6 +1495,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 */
 			prstate->live_tuples++;
 			prstate->all_visible = false;
+			prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
@@ -1554,10 +1560,10 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
 	 * handled (handled here, or handled later on).
 	 *
-	 * Similarly, don't unset all_visible until later, at the end of
-	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
-	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * Similarly, don't unset all_visible and all_frozen until later, at the
+	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
+	 * freeze the page after pruning.  As long as we unset it before updating
+	 * the visibility map, this will be correct.
 	 */
 
 	/* Record the dead offset for vacuum */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2b9e5c7f81b..e1b7456823d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2017,7 +2017,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * agreement with heap_page_is_all_visible() using an assertion.
 	 */
 #ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
 	if (presult.all_visible)
 	{
 		TransactionId debug_cutoff;
@@ -2071,6 +2070,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
 	Assert(!presult.all_visible || !(*has_lpdead_items));
+	Assert(!presult.all_frozen || presult.all_visible);
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
@@ -2176,11 +2176,10 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
+	 * it as all-frozen.
 	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	else if (all_visible_according_to_vm && presult.all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		uint8		old_vmbits;
 
-- 
2.43.0



  [text/x-patch] v20-0003-Update-PruneState.all_-visible-frozen-earlier-in.patch (9.7K, 4-v20-0003-Update-PruneState.all_-visible-frozen-earlier-in.patch)
  download | inline diff:
From 8ae7479e5f9191e26d30c5fa133a8322c19549c5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:21:49 -0400
Subject: [PATCH v20 03/12] Update PruneState.all_[visible|frozen] earlier in
 pruning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items. This allows opportunistic
freezing if the page would otherwise be fully frozen, since those dead
items are later removed in vacuum’s third phase.

To move the VM update into the same WAL record that
prunes and freezes tuples, we must know whether the page will
be marked all-visible/all-frozen before emitting WAL.

The only barrier to updating these flags immediately after deciding
whether to opportunistically freeze is that we previously used
all_frozen to compute the snapshot conflict horizon when freezing
tuples. By determining the cutoff earlier, we can update the flags
immediately after making the freeze decision.

This is required to set the VM in the XLOG_HEAP2_PRUNE_VACUUM_SCAN
record emitted by pruning and freezing.
---
 src/backend/access/heap/pruneheap.c | 117 ++++++++++++++--------------
 1 file changed, 57 insertions(+), 60 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7cd51c7be33..86da2743423 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -138,11 +138,11 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
+	 * That's convenient for heap_page_prune_and_freeze() to use them to
+	 * decide whether to freeze the page or not.  The all_visible and
+	 * all_frozen values returned to the caller are adjusted to include
+	 * LP_DEAD items after we determine whether to opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -175,7 +175,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
-								  PruneState *prstate);
+								  PruneState *prstate,
+								  TransactionId *frz_conflict_horizon);
 
 
 /*
@@ -306,7 +307,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * performs several pre-freeze checks.
  *
  * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
- * determined before calling this function.
+ * determined before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to
+ * freeze tuples.
  *
  * prstate is both an input and output parameter.
  *
@@ -318,7 +321,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 					  bool did_tuple_hint_fpi,
 					  bool do_prune,
 					  bool do_hint_prune,
-					  PruneState *prstate)
+					  PruneState *prstate,
+					  TransactionId *frz_conflict_horizon)
 {
 	bool		do_freeze = false;
 
@@ -388,6 +392,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * critical section.
 		 */
 		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+		/*
+		 * Calculate what the snapshot conflict horizon should be for a record
+		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+		 * for conflicts when the whole page is eligible to become all-frozen
+		 * in the VM once we're done with it. Otherwise, we generate a
+		 * conservative cutoff by stepping back from OldestXmin.
+		 */
+		if (prstate->all_frozen)
+			*frz_conflict_horizon = prstate->visibility_cutoff_xid;
+		else
+		{
+			/* Avoids false conflicts when hot_standby_feedback in use */
+			*frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+			TransactionIdRetreat(*frz_conflict_horizon);
+		}
 	}
 	else if (prstate->nfrozen > 0)
 	{
@@ -433,10 +453,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
  * 'new_relmin_mxid' arguments are required when freezing.  When
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen on exit, to indicate if the VM bits can be set.
- * They are always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not
- * passed, because at the moment only callers that also freeze need that
- * information.
+ * and presult->all_frozen after determining whether or not to
+ * opporunistically freeze, to indicate if the VM bits can be set.  They are
+ * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
+ * because at the moment only callers that also freeze need that information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -472,6 +492,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_hint_prune;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId frz_conflict_horizon = InvalidTransactionId;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = params->vistest;
@@ -541,10 +562,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * are tuples present that are not visible to everyone or if there are
 	 * dead tuples which are not yet removable.  However, dead tuples which
 	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible and all_frozen when we see LP_DEAD items.  We fix that at
-	 * the end of the function, when we return the value to the caller, so
-	 * that the caller doesn't set the VM bits incorrectly.
+	 * opportunistically freezing.  Because of that, we do not immediately
+	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
+	 * that after scanning the line pointers, before we return the value to
+	 * the caller, so that the caller doesn't set the VM bits incorrectly.
 	 */
 	if (prstate.attempt_freeze)
 	{
@@ -779,7 +800,24 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  did_tuple_hint_fpi,
 									  do_prune,
 									  do_hint_prune,
-									  &prstate);
+									  &prstate,
+									  &frz_conflict_horizon);
+
+	/*
+	 * While scanning the line pointers, we did not clear
+	 * all_visible/all_frozen when encountering LP_DEAD items because we
+	 * wanted the decision whether or not to freeze the page to be unaffected
+	 * by the short-term presence of LP_DEAD items.  These LP_DEAD items are
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that we finished determining whether or not to freeze the page,
+	 * update all_visible and all_frozen so that they reflect the true state
+	 * of the page for setting PD_ALL_VISIBLE and VM bits.
+	 */
+	if (prstate.lpdead_items > 0)
+		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
 
@@ -841,27 +879,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
 			TransactionId conflict_xid;
 
-			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
-			 */
-			if (do_freeze)
-			{
-				if (prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
-			}
-
 			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
 				conflict_xid = frz_conflict_horizon;
 			else
@@ -887,30 +906,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible/all_frozen
-	 * earlier on to make the choice of whether or not to freeze the page
-	 * unaffected by the short-term presence of LP_DEAD items.  These LP_DEAD
-	 * items were effectively assumed to be LP_UNUSED items in the making.  It
-	 * doesn't matter which vacuum heap pass (initial pass or final pass) ends
-	 * up setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible and all_frozen
-	 * if there are any LP_DEAD items on the page.  It needs to reflect the
-	 * present state of the page, as expected by our caller.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
+	presult->all_visible = prstate.all_visible;
+	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
 	/*
-- 
2.43.0



  [text/x-patch] v20-0004-Split-heap_page_prune_and_freeze-into-helpers.patch (25.7K, 5-v20-0004-Split-heap_page_prune_and_freeze-into-helpers.patch)
  download | inline diff:
From 13915bbef249e70af8167f77cc13b5ab88a9948f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 17 Nov 2025 15:11:27 -0500
Subject: [PATCH v20 04/12] Split heap_page_prune_and_freeze() into helpers

Refactor the setup and planning phases of pruning and freezing into
helpers. This streamlines heap_page_prune_and_freeze() and makes it more
clear when the examination of tuples ends and page modifications begin.
---
 src/backend/access/heap/pruneheap.c | 565 +++++++++++++++-------------
 1 file changed, 310 insertions(+), 255 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 86da2743423..9104c742a61 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -150,6 +150,14 @@ typedef struct
 } PruneState;
 
 /* Local functions */
+static void prune_freeze_setup(PruneFreezeParams *params,
+							   TransactionId new_relfrozen_xid,
+							   MultiXactId new_relmin_mxid,
+							   const PruneFreezeResult *presult,
+							   PruneState *prstate);
+static void prune_freeze_plan(Oid reloid, Buffer buffer,
+							  PruneState *prstate,
+							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 											   HeapTuple tup,
 											   Buffer buffer);
@@ -302,204 +310,22 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 }
 
 /*
- * Decide whether to proceed with freezing according to the freeze plans
- * prepared for the given heap buffer. If freezing is chosen, this function
- * performs several pre-freeze checks.
- *
- * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
- * determined before calling this function. *frz_conflict_horizon is set to
- * the snapshot conflict horizon we for the WAL record should we decide to
- * freeze tuples.
- *
- * prstate is both an input and output parameter.
- *
- * Returns true if we should apply the freeze plans and freeze tuples on the
- * page, and false otherwise.
+ * Helper for heap_page_prune_and_freeze() to initialize the PruneState using
+ * the provided parameters.
  */
-static bool
-heap_page_will_freeze(Relation relation, Buffer buffer,
-					  bool did_tuple_hint_fpi,
-					  bool do_prune,
-					  bool do_hint_prune,
-					  PruneState *prstate,
-					  TransactionId *frz_conflict_horizon)
-{
-	bool		do_freeze = false;
-
-	/*
-	 * If the caller specified we should not attempt to freeze any tuples,
-	 * validate that everything is in the right state and return.
-	 */
-	if (!prstate->attempt_freeze)
-	{
-		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
-		return false;
-	}
-
-	if (prstate->pagefrz.freeze_required)
-	{
-		/*
-		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
-		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
-		 * advance relfrozenxid/relminmxid.
-		 */
-		do_freeze = true;
-	}
-	else
-	{
-		/*
-		 * Opportunistically freeze the page if we are generating an FPI
-		 * anyway and if doing so means that we can set the page all-frozen
-		 * afterwards (might not happen until VACUUM's final heap pass).
-		 *
-		 * XXX: Previously, we knew if pruning emitted an FPI by checking
-		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
-		 * prune records were combined, this heuristic couldn't be used
-		 * anymore.  The opportunistic freeze heuristic must be improved;
-		 * however, for now, try to approximate the old logic.
-		 */
-		if (prstate->all_frozen && prstate->nfrozen > 0)
-		{
-			Assert(prstate->all_visible);
-
-			/*
-			 * Freezing would make the page all-frozen.  Have already emitted
-			 * an FPI or will do so anyway?
-			 */
-			if (RelationNeedsWAL(relation))
-			{
-				if (did_tuple_hint_fpi)
-					do_freeze = true;
-				else if (do_prune)
-				{
-					if (XLogCheckBufferNeedsBackup(buffer))
-						do_freeze = true;
-				}
-				else if (do_hint_prune)
-				{
-					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-						do_freeze = true;
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
-
-		/*
-		 * Calculate what the snapshot conflict horizon should be for a record
-		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
-		 * for conflicts when the whole page is eligible to become all-frozen
-		 * in the VM once we're done with it. Otherwise, we generate a
-		 * conservative cutoff by stepping back from OldestXmin.
-		 */
-		if (prstate->all_frozen)
-			*frz_conflict_horizon = prstate->visibility_cutoff_xid;
-		else
-		{
-			/* Avoids false conflicts when hot_standby_feedback in use */
-			*frz_conflict_horizon = prstate->cutoffs->OldestXmin;
-			TransactionIdRetreat(*frz_conflict_horizon);
-		}
-	}
-	else if (prstate->nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate->pagefrz.freeze_required);
-
-		prstate->all_frozen = false;
-		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
-
-	return do_freeze;
-}
-
-
-/*
- * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
- *
- * Caller must have pin and buffer cleanup lock on the page.  Note that we
- * don't update the FSM information for page on caller's behalf.  Caller might
- * also need to account for a reduction in the length of the line pointer
- * array following array truncation by us.
- *
- * params contains the input parameters used to control freezing and pruning
- * behavior. See the definition of PruneFreezeParams for more on what each
- * parameter does.
- *
- * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
- * tuples if it's required in order to advance relfrozenxid / relminmxid, or
- * if it's considered advantageous for overall system performance to do so
- * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opporunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
- *
- * presult contains output parameters needed by callers, such as the number of
- * tuples removed and the offsets of dead items on the page after pruning.
- * heap_page_prune_and_freeze() is responsible for initializing it.  Required
- * by all callers.
- *
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
- * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PAGE_PRUNE_FREEZE option is set in params.  On entry, they contain the
- * oldest XID and multi-XID seen on the relation so far.  They will be updated
- * with oldest values present on the page after pruning.  After processing the
- * whole relation, VACUUM can use these values as the new
- * relfrozenxid/relminmxid for the relation.
- */
-void
-heap_page_prune_and_freeze(PruneFreezeParams *params,
-						   PruneFreezeResult *presult,
-						   OffsetNumber *off_loc,
-						   TransactionId *new_relfrozen_xid,
-						   MultiXactId *new_relmin_mxid)
+static void
+prune_freeze_setup(PruneFreezeParams *params,
+				   TransactionId new_relfrozen_xid,
+				   MultiXactId new_relmin_mxid,
+				   const PruneFreezeResult *presult,
+				   PruneState *prstate)
 {
-	Buffer		buffer = params->buffer;
-	Page		page = BufferGetPage(buffer);
-	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	OffsetNumber offnum,
-				maxoff;
-	PruneState	prstate;
-	HeapTupleData tup;
-	bool		do_freeze;
-	bool		do_prune;
-	bool		do_hint_prune;
-	bool		did_tuple_hint_fpi;
-	int64		fpi_before = pgWalUsage.wal_fpi;
-	TransactionId frz_conflict_horizon = InvalidTransactionId;
-
 	/* Copy parameters to prstate */
-	prstate.vistest = params->vistest;
-	prstate.mark_unused_now =
+	prstate->vistest = params->vistest;
+	prstate->mark_unused_now =
 		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
-	prstate.cutoffs = params->cutoffs;
+	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->cutoffs = params->cutoffs;
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -512,40 +338,41 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * prunable, we will save the lowest relevant XID in new_prune_xid. Also
 	 * initialize the rest of our working state.
 	 */
-	prstate.new_prune_xid = InvalidTransactionId;
-	prstate.latest_xid_removed = InvalidTransactionId;
-	prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
-	prstate.nroot_items = 0;
-	prstate.nheaponly_items = 0;
+	prstate->new_prune_xid = InvalidTransactionId;
+	prstate->latest_xid_removed = InvalidTransactionId;
+	prstate->nredirected = prstate->ndead = prstate->nunused = 0;
+	prstate->nfrozen = 0;
+	prstate->nroot_items = 0;
+	prstate->nheaponly_items = 0;
 
 	/* initialize page freezing working state */
-	prstate.pagefrz.freeze_required = false;
-	if (prstate.attempt_freeze)
+	prstate->pagefrz.freeze_required = false;
+	if (prstate->attempt_freeze)
 	{
-		Assert(new_relfrozen_xid && new_relmin_mxid);
-		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
-		prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
-		prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
-		prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+		prstate->pagefrz.FreezePageRelfrozenXid = new_relfrozen_xid;
+		prstate->pagefrz.NoFreezePageRelfrozenXid = new_relfrozen_xid;
+		prstate->pagefrz.FreezePageRelminMxid = new_relmin_mxid;
+		prstate->pagefrz.NoFreezePageRelminMxid = new_relmin_mxid;
 	}
 	else
 	{
-		Assert(new_relfrozen_xid == NULL && new_relmin_mxid == NULL);
-		prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
-		prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
-		prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
-		prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+		Assert(new_relfrozen_xid == InvalidTransactionId &&
+			   new_relmin_mxid == InvalidMultiXactId);
+		prstate->pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+		prstate->pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+		prstate->pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+		prstate->pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
 	}
 
-	prstate.ndeleted = 0;
-	prstate.live_tuples = 0;
-	prstate.recently_dead_tuples = 0;
-	prstate.hastup = false;
-	prstate.lpdead_items = 0;
-	prstate.deadoffsets = presult->deadoffsets;
+	prstate->ndeleted = 0;
+	prstate->live_tuples = 0;
+	prstate->recently_dead_tuples = 0;
+	prstate->hastup = false;
+	prstate->lpdead_items = 0;
+	prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
+	 * Vacuum may update the VM after we're done.  We can keep track of
 	 * whether the page will be all-visible and all-frozen after pruning and
 	 * freezing to help the caller to do that.
 	 *
@@ -567,10 +394,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * that after scanning the line pointers, before we return the value to
 	 * the caller, so that the caller doesn't set the VM bits incorrectly.
 	 */
-	if (prstate.attempt_freeze)
+	if (prstate->attempt_freeze)
 	{
-		prstate.all_visible = true;
-		prstate.all_frozen = true;
+		prstate->all_visible = true;
+		prstate->all_frozen = true;
 	}
 	else
 	{
@@ -578,8 +405,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * Initializing to false allows skipping the work to update them in
 		 * heap_prune_record_unchanged_lp_normal().
 		 */
-		prstate.all_visible = false;
-		prstate.all_frozen = false;
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
 	}
 
 	/*
@@ -590,10 +417,29 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * running transaction on the standby does not see tuples on the page as
 	 * all-visible, so the conflict horizon remains InvalidTransactionId.
 	 */
-	prstate.visibility_cutoff_xid = InvalidTransactionId;
+	prstate->visibility_cutoff_xid = InvalidTransactionId;
+}
 
-	maxoff = PageGetMaxOffsetNumber(page);
-	tup.t_tableOid = RelationGetRelid(params->relation);
+/*
+ * Helper for heap_page_prune_and_freeze(). Iterates over every tuple on the
+ * page, examines its visibility information, and determines the appropriate
+ * action for each tuple. All tuples are processed and classified during this
+ * phase, but no modifications are made to the page until the later execution
+ * stage.
+ *
+ * *off_loc is used for error callback and cleared before returning.
+ */
+static void
+prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
+				  OffsetNumber *off_loc)
+{
+	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+	OffsetNumber offnum;
+	HeapTupleData tup;
+
+	tup.t_tableOid = reloid;
 
 	/*
 	 * Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -628,13 +474,13 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		*off_loc = offnum;
 
-		prstate.processed[offnum] = false;
-		prstate.htsv[offnum] = -1;
+		prstate->processed[offnum] = false;
+		prstate->htsv[offnum] = -1;
 
 		/* Nothing to do if slot doesn't contain a tuple */
 		if (!ItemIdIsUsed(itemid))
 		{
-			heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
+			heap_prune_record_unchanged_lp_unused(page, prstate, offnum);
 			continue;
 		}
 
@@ -644,17 +490,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * If the caller set mark_unused_now true, we can set dead line
 			 * pointers LP_UNUSED now.
 			 */
-			if (unlikely(prstate.mark_unused_now))
-				heap_prune_record_unused(&prstate, offnum, false);
+			if (unlikely(prstate->mark_unused_now))
+				heap_prune_record_unused(prstate, offnum, false);
 			else
-				heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
+				heap_prune_record_unchanged_lp_dead(page, prstate, offnum);
 			continue;
 		}
 
 		if (ItemIdIsRedirected(itemid))
 		{
 			/* This is the start of a HOT chain */
-			prstate.root_items[prstate.nroot_items++] = offnum;
+			prstate->root_items[prstate->nroot_items++] = offnum;
 			continue;
 		}
 
@@ -668,21 +514,15 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		tup.t_len = ItemIdGetLength(itemid);
 		ItemPointerSet(&tup.t_self, blockno, offnum);
 
-		prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
-														   buffer);
+		prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, &tup,
+															buffer);
 
 		if (!HeapTupleHeaderIsHeapOnly(htup))
-			prstate.root_items[prstate.nroot_items++] = offnum;
+			prstate->root_items[prstate->nroot_items++] = offnum;
 		else
-			prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
+			prstate->heaponly_items[prstate->nheaponly_items++] = offnum;
 	}
 
-	/*
-	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
-	 * an FPI to be emitted.
-	 */
-	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
-
 	/*
 	 * Process HOT chains.
 	 *
@@ -694,30 +534,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * the page instead of using the root_items array, also did it in
 	 * ascending offset number order.)
 	 */
-	for (int i = prstate.nroot_items - 1; i >= 0; i--)
+	for (int i = prstate->nroot_items - 1; i >= 0; i--)
 	{
-		offnum = prstate.root_items[i];
+		offnum = prstate->root_items[i];
 
 		/* Ignore items already processed as part of an earlier chain */
-		if (prstate.processed[offnum])
+		if (prstate->processed[offnum])
 			continue;
 
 		/* see preceding loop */
 		*off_loc = offnum;
 
 		/* Process this item or chain of items */
-		heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
+		heap_prune_chain(page, blockno, maxoff, offnum, prstate);
 	}
 
 	/*
 	 * Process any heap-only tuples that were not already processed as part of
 	 * a HOT chain.
 	 */
-	for (int i = prstate.nheaponly_items - 1; i >= 0; i--)
+	for (int i = prstate->nheaponly_items - 1; i >= 0; i--)
 	{
-		offnum = prstate.heaponly_items[i];
+		offnum = prstate->heaponly_items[i];
 
-		if (prstate.processed[offnum])
+		if (prstate->processed[offnum])
 			continue;
 
 		/* see preceding loop */
@@ -736,7 +576,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * return true for an XMIN_INVALID tuple, so this code will work even
 		 * when there were sequential updates within the aborted transaction.)
 		 */
-		if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
+		if (prstate->htsv[offnum] == HEAPTUPLE_DEAD)
 		{
 			ItemId		itemid = PageGetItemId(page, offnum);
 			HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -744,8 +584,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
 			{
 				HeapTupleHeaderAdvanceConflictHorizon(htup,
-													  &prstate.latest_xid_removed);
-				heap_prune_record_unused(&prstate, offnum, true);
+													  &prstate->latest_xid_removed);
+				heap_prune_record_unused(prstate, offnum, true);
 			}
 			else
 			{
@@ -762,7 +602,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			}
 		}
 		else
-			heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
+			heap_prune_record_unchanged_lp_normal(page, prstate, offnum);
 	}
 
 	/* We should now have processed every tuple exactly once  */
@@ -773,12 +613,227 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	{
 		*off_loc = offnum;
 
-		Assert(prstate.processed[offnum]);
+		Assert(prstate->processed[offnum]);
 	}
 #endif
 
 	/* Clear the offset information once we have processed the given page. */
 	*off_loc = InvalidOffsetNumber;
+}
+
+/*
+ * Decide whether to proceed with freezing according to the freeze plans
+ * prepared for the given heap buffer. If freezing is chosen, this function
+ * performs several pre-freeze checks.
+ *
+ * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
+ * determined before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to
+ * freeze tuples.
+ *
+ * prstate is both an input and output parameter.
+ *
+ * Returns true if we should apply the freeze plans and freeze tuples on the
+ * page, and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool did_tuple_hint_fpi,
+					  bool do_prune,
+					  bool do_hint_prune,
+					  PruneState *prstate,
+					  TransactionId *frz_conflict_horizon)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and return.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			Assert(prstate->all_visible);
+
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_prune)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+		/*
+		 * Calculate what the snapshot conflict horizon should be for a record
+		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+		 * for conflicts when the whole page is eligible to become all-frozen
+		 * in the VM once we're done with it. Otherwise, we generate a
+		 * conservative cutoff by stepping back from OldestXmin.
+		 */
+		if (prstate->all_frozen)
+			*frz_conflict_horizon = prstate->visibility_cutoff_xid;
+		else
+		{
+			/* Avoids false conflicts when hot_standby_feedback in use */
+			*frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+			TransactionIdRetreat(*frz_conflict_horizon);
+		}
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	return do_freeze;
+}
+
+
+/*
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
+ *
+ * Caller must have pin and buffer cleanup lock on the page.  Note that we
+ * don't update the FSM information for page on caller's behalf.  Caller might
+ * also need to account for a reduction in the length of the line pointer
+ * array following array truncation by us.
+ *
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
+ *
+ * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
+ * tuples if it's required in order to advance relfrozenxid / relminmxid, or
+ * if it's considered advantageous for overall system performance to do so
+ * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
+ * 'new_relmin_mxid' arguments are required when freezing.  When
+ * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
+ * and presult->all_frozen after determining whether or not to
+ * opporunistically freeze, to indicate if the VM bits can be set.  They are
+ * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
+ * because at the moment only callers that also freeze need that information.
+ *
+ * presult contains output parameters needed by callers, such as the number of
+ * tuples removed and the offsets of dead items on the page after pruning.
+ * heap_page_prune_and_freeze() is responsible for initializing it.  Required
+ * by all callers.
+ *
+ * off_loc is the offset location required by the caller to use in error
+ * callback.
+ *
+ * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
+ * HEAP_PAGE_PRUNE_FREEZE option is set in params.  On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far.  They will be updated
+ * with oldest values present on the page after pruning.  After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
+ */
+void
+heap_page_prune_and_freeze(PruneFreezeParams *params,
+						   PruneFreezeResult *presult,
+						   OffsetNumber *off_loc,
+						   TransactionId *new_relfrozen_xid,
+						   MultiXactId *new_relmin_mxid)
+{
+	Buffer		buffer = params->buffer;
+	Page		page = BufferGetPage(buffer);
+	PruneState	prstate;
+	bool		do_freeze;
+	bool		do_prune;
+	bool		do_hint_prune;
+	bool		did_tuple_hint_fpi;
+	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId frz_conflict_horizon = InvalidTransactionId;
+
+	/* Initialize prstate */
+	prune_freeze_setup(params,
+					   new_relfrozen_xid ?
+					   *new_relfrozen_xid : InvalidTransactionId,
+					   new_relmin_mxid ?
+					   *new_relmin_mxid : InvalidMultiXactId,
+					   presult,
+					   &prstate);
+
+	/*
+	 * Examine all line pointers and tuple visibility information to determine
+	 * which line pointers should change state and which tuples may be frozen.
+	 * Prepare queue of state changes to later be executed in a critical
+	 * section.
+	 */
+	prune_freeze_plan(RelationGetRelid(params->relation),
+					  buffer, &prstate, off_loc);
+
+	/*
+	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
+	 * checking tuple visibility information in prune_freeze_plan() may have
+	 * caused an FPI to be emitted.
+	 */
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	do_prune = prstate.nredirected > 0 ||
 		prstate.ndead > 0 ||
-- 
2.43.0



  [text/x-patch] v20-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (43.5K, 6-v20-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From f20655f822e7b83d14cc6616a992b69799c859cb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 8 Oct 2025 15:39:01 -0400
Subject: [PATCH v20 05/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c |  37 ++-
 src/backend/access/heap/pruneheap.c   | 460 +++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c  | 241 +-------------
 src/include/access/heapam.h           |  43 ++-
 4 files changed, 447 insertions(+), 334 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11cb3f74da5..2af724451c3 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -104,6 +104,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		OffsetNumber *frz_offsets;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 		bool		do_prune;
+		bool		set_lsn = false;
+		bool		mark_buffer_dirty = false;
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
 											   &nplans, &plans, &frz_offsets,
@@ -157,17 +159,36 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
-		if (vmflags & VISIBILITYMAP_VALID_BITS)
-			PageSetAllVisible(page);
-
-		MarkBufferDirty(buffer);
+		if (do_prune || nplans > 0)
+			mark_buffer_dirty = set_lsn = true;
 
 		/*
-		 * See log_heap_prune_and_freeze() for commentary on when we set the
-		 * heap page LSN.
+		 * The critical integrity requirement here is that we must never end
+		 * up with with the visibility map bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear.  If that were to occur, a subsequent page
+		 * modification would fail to clear the visibility map bit.
+		 *
+		 * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
+		 * marking an all-visible page all-frozen). If only the VM is updated,
+		 * the heap page need not be dirtied.
 		 */
-		if (do_prune || nplans > 0 ||
-			((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			mark_buffer_dirty = true;
+
+			/*
+			 * See log_heap_prune_and_freeze() for commentary on when we set
+			 * the heap page LSN.
+			 */
+			if (XLogHintBitIsNeeded())
+				set_lsn = true;
+		}
+
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+
+		if (set_lsn)
 			PageSetLSN(page, lsn);
 
 		/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9104c742a61..5667df86bae 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -133,16 +135,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -181,11 +184,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid, bool blk_already_av,
+									  bool set_blk_all_frozen);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate,
 								  TransactionId *frz_conflict_horizon);
-
+static bool heap_page_will_set_vis(Relation relation,
+								   BlockNumber heap_blk,
+								   Buffer heap_buf,
+								   Buffer vmbuffer,
+								   bool blk_known_av,
+								   const PruneState *prstate,
+								   uint8 *vmflags,
+								   bool *do_set_pd_vis);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -272,6 +286,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * current implementation.
 			 */
 			PruneFreezeParams params = {.relation = relation,.buffer = buffer,
+				.vmbuffer = InvalidBuffer,.blk_known_av = false,
 				.reason = PRUNE_ON_ACCESS,.options = 0,
 				.vistest = vistest,.cutoffs = NULL
 			};
@@ -325,6 +340,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->mark_unused_now =
 		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
 	prstate->cutoffs = params->cutoffs;
 
 	/*
@@ -372,50 +389,54 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
+	 *
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
-	 * that after scanning the line pointers, before we return the value to
-	 * the caller, so that the caller doesn't set the VM bits incorrectly.
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+	 * them after deciding whether to freeze, but before updating the VM, to
+	 * avoid setting the VM bits incorrectly.
+	 *
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate->attempt_freeze)
 	{
 		prstate->all_visible = true;
 		prstate->all_frozen = true;
 	}
+	else if (prstate->attempt_update_vm)
+	{
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate->all_visible = false;
 		prstate->all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -753,10 +774,133 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneState and blk_known_av. Some callers may already
+ * have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
+ * should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+					   BlockNumber heap_blk,
+					   Buffer heap_buf,
+					   Buffer vmbuffer,
+					   bool blk_known_av,
+					   const PruneState *prstate,
+					   uint8 *vmflags,
+					   bool *do_set_pd_vis)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+	bool		do_set_vm = false;
+
+	*do_set_pd_vis = false;
+
+	if (!prstate->attempt_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	if (prstate->all_visible && !PageIsAllVisible(heap_page))
+		*do_set_pd_vis = true;
+
+	if ((prstate->all_visible && !blk_known_av) ||
+		(prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+		do_set_vm = true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that in vacuum the bit
+	 * got cleared after heap_vac_scan_next_block() was called, so we must
+	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return do_set_vm;
+}
+
+
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+#endif
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -771,12 +915,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opporunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -801,14 +946,21 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
+	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	TransactionId frz_conflict_horizon = InvalidTransactionId;
+	TransactionId conflict_xid = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -875,6 +1027,34 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+	 * based on information from the VM and the all_visible/all_frozen flags.
+	 *
+	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+	 * VM bit is clear, we strongly prefer to keep them in sync.
+	 *
+	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+	 * already been set. Setting only the VM is most common when setting an
+	 * already all-visible page all-frozen.
+	 */
+	do_set_vm = heap_page_will_set_vis(params->relation,
+									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   &prstate, &new_vmbits, &do_set_pd_vis);
+
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									prstate.latest_xid_removed, frz_conflict_horizon,
+									prstate.visibility_cutoff_xid, params->blk_known_av,
+									(do_set_vm && (new_vmbits & VISIBILITYMAP_ALL_FROZEN)));
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -896,14 +1076,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_pd_vis)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -917,35 +1100,43 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (do_set_pd_vis)
+			PageSetAllVisible(page);
 
-		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
-		 */
-		if (RelationNeedsWAL(params->relation))
+		if (do_prune || do_freeze || do_set_pd_vis)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
+			Assert(PageIsAllVisible(page));
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
+			old_vmbits = visibilitymap_set_vmbits(blockno,
+												  vmbuffer, new_vmbits,
+												  params->relation->rd_locator);
+			if (old_vmbits == new_vmbits)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				/* Unset so we don't emit WAL since no change occurred */
+				do_set_vm = false;
+			}
+		}
 
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+		 * only updating the VM and it turns out it was already set, we will
+		 * have unset do_set_vm earlier. As such, check it again before
+		 * emitting the record.
+		 */
+		if (RelationNeedsWAL(params->relation) &&
+			(do_prune || do_freeze || do_set_vm))
+		{
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  do_set_pd_vis,
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -955,28 +1146,47 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
+
+		if (!heap_page_is_all_visible(params->relation, buffer,
+									  prstate.cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1468,6 +1678,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			{
 				TransactionId xmin;
 
+				Assert(prstate->attempt_update_vm);
+
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = false;
@@ -2118,6 +2330,65 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 	return nplans;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid, bool blk_already_av,
+				 bool set_blk_all_frozen)
+{
+	TransactionId conflict_xid;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune && !do_freeze &&
+		do_set_vm && blk_already_av && set_blk_all_frozen)
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	return conflict_xid;
+}
+
 /*
  * Write an XLOG_HEAP2_PRUNE* WAL record
  *
@@ -2142,6 +2413,15 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
+ * In some cases, such as when heap_page_prune_and_freeze() is setting an
+ * already marked all-visible page all-frozen, PD_ALL_VISIBLE may already be
+ * set. So, it is possible for vmflags to be non-zero and set_pd_all_vis to be
+ * false.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
@@ -2151,6 +2431,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 						  Buffer vmbuffer, uint8 vmflags,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
+						  bool set_pd_all_vis,
 						  PruneReason reason,
 						  HeapTupleFreeze *frozen, int nfrozen,
 						  OffsetNumber *redirected, int nredirected,
@@ -2187,7 +2468,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 */
 	if (!do_prune &&
 		nfrozen == 0 &&
-		(!do_set_vm || !XLogHintBitIsNeeded()))
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags_heap |= REGBUF_NO_IMAGE;
 
 	/*
@@ -2305,7 +2586,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * See comment at the top of the function about regbuf_flags_heap for
 	 * details on when we can advance the page LSN.
 	 */
-	if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
 	{
 		Assert(BufferIsDirty(buffer));
 		PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e1b7456823d..a7a974b6639 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,20 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
-										   OffsetNumber *deadoffsets,
-										   int ndeadoffsets,
-										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
-										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -1966,7 +1952,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	Relation	rel = vacrel->rel;
 	PruneFreezeResult presult;
 	PruneFreezeParams params = {.relation = rel,.buffer = buf,
-		.reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE,
+		.vmbuffer = vmbuffer,.blk_known_av = all_visible_according_to_vm,
+		.reason = PRUNE_VACUUM_SCAN,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS,
 		.cutoffs = &vacrel->cutoffs,.vistest = vacrel->vistest
 	};
 
@@ -2009,33 +1997,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2069,168 +2030,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
 	return presult.ndeleted;
@@ -2952,6 +2771,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  vmflags,
 								  conflict_xid,
 								  false,	/* no cleanup lock required */
+								  (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
@@ -3632,30 +3452,6 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
-{
-
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
-										  NULL, 0,
-										  all_frozen,
-										  visibility_cutoff_xid,
-										  logging_offnum);
-}
-#endif
-
 /*
  * Check whether the heap page in buf is all-visible except for the dead
  * tuples referenced in the deadoffsets array.
@@ -3678,15 +3474,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
  * This logic is closely related to heap_prune_record_unchanged_lp_normal().
  * If you modify this function, ensure consistency with that code. An
  * assertion cross-checks that both remain in agreement. Do not introduce new
  * side-effects.
  */
-static bool
+bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..937b46a77db 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 *
+	 * vmbuffer is the buffer that must already contain contain the required
+	 * block of the visibility map if we are to update it. blk_known_av is the
+	 * visibility status of the heap block as of the last call to
+	 * find_next_unskippable_block().
+	 */
+	Buffer		vmbuffer;
+	bool		blk_known_av;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -250,8 +261,10 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
+	 *
+	 * HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status
+	 * in the VM.
 	 */
 	int			options;
 
@@ -285,19 +298,15 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+	 * we have attempted to update the VM.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
+	uint8		new_vmbits;
+	uint8		old_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -424,6 +433,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 									  Buffer vmbuffer, uint8 vmflags,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
+									  bool set_pd_all_vis,
 									  PruneReason reason,
 									  HeapTupleFreeze *frozen, int nfrozen,
 									  OffsetNumber *redirected, int nredirected,
@@ -433,6 +443,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v20-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.5K, 7-v20-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 8519fcd867cd83f416513e635e0591af6c86a712 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v20 06/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a7a974b6639..fa7be0f857f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1872,9 +1872,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1891,13 +1894,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  true, /* set_pd_all_vis */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v20-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (26.2K, 8-v20-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 2e0e8e8365e4abd086978db70890bffd6e367b2e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v20 07/12] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |  18 ++-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 112 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  28 +---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 54 insertions(+), 379 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4b0c49f4bb0..2bff37e03b5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8797,50 +8797,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2af724451c3..5ab46e8bf8f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -251,7 +251,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -264,142 +264,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -777,8 +641,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -790,11 +654,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1375,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5667df86bae..d6b22b7b1c5 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1110,9 +1110,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		{
 			Assert(PageIsAllVisible(page));
 
-			old_vmbits = visibilitymap_set_vmbits(blockno,
-												  vmbuffer, new_vmbits,
-												  params->relation->rd_locator);
+			old_vmbits = visibilitymap_set(blockno,
+										   vmbuffer, new_vmbits,
+										   params->relation->rd_locator);
 			if (old_vmbits == new_vmbits)
 			{
 				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
@@ -2394,14 +2394,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ *   all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fa7be0f857f..fd68dfcfce2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1894,11 +1894,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2776,9 +2776,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 * set PD_ALL_VISIBLE.
 		 */
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..e9e77bd678b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8698918f443..76343fdf476 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4289,7 +4289,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v20-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (8.0K, 9-v20-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From 321461cd0fa02408657f46d2ec0495e8a69790d7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v20 08/12] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.

The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 12 ++++++------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 17 ++++++++---------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d6b22b7b1c5..a2d872e5beb 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -244,7 +244,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -469,7 +469,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
+	 * checked item causes GlobalVisFullXidVisibleToAll() to update the
 	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
 	 * transaction aborts.
 	 *
@@ -1236,11 +1236,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1699,7 +1699,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				/*
 				 * For now always use prstate->cutoffs for this test, because
 				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisTestIsRemovableXid
+				 * is requested. We could use GlobalVisXidVisibleToAll()
 				 * instead, if a non-freezing caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 71ef2e5036f..1c0eb425ee9 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 }
 
 /*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
  *
  * It is crucial that this only gets called for xids from a source that
  * protects against xid wraparounds (e.g. from a table and thus protected by
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
  */
 bool
 GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v20-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (10.5K, 10-v20-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 013dd7d70e6af1ad578e9ff4a3753830e9548cbb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v20 09/12] Use GlobalVisState in vacuum to determine page
 level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++
 src/backend/access/heap/pruneheap.c         | 43 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 10 ++---
 src/include/access/heapam.h                 | 11 +++---
 4 files changed, 58 insertions(+), 34 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a2d872e5beb..fdbed5ac74d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -432,11 +432,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -882,14 +883,13 @@ heap_page_will_set_vis(Relation relation,
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -991,6 +991,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update the hint
@@ -1163,10 +1173,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		if (!heap_page_is_all_visible(params->relation, buffer,
-									  prstate.cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1696,20 +1705,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisXidVisibleToAll()
-				 * instead, if a non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fd68dfcfce2..fdf37625cd0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2730,7 +2730,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3489,7 +3489,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3505,7 +3505,7 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3579,7 +3579,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 		/* Visibility checks may do IO or allocate memory */
 		Assert(CritSectionCount == 0);
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3598,7 +3598,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 937b46a77db..2b6a521e4ea 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -276,10 +276,9 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Contains the cutoffs used for freezing. They are required if the
-	 * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
-	 * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
-	 * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
-	 * calculates them once, at the beginning of vacuuming the relation.
+	 * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+	 * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+	 * relation.
 	 */
 	struct VacuumCutoffs *cutoffs;
 } PruneFreezeParams;
@@ -444,7 +443,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -458,6 +457,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v20-0010-Unset-all_visible-sooner-if-not-freezing.patch (2.4K, 11-v20-0010-Unset-all_visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 6da1fcc5cb57cfc3b21ebb741dcde6fa207ccc4a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v20 10/12] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
---
 src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fdbed5ac74d..afb4251ad91 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1562,8 +1562,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	/*
 	 * Deliberately delay unsetting all_visible and all_frozen until later
 	 * during pruning. Removable dead tuples shouldn't preclude freezing the
-	 * page.
+	 * page. If we won't attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1822,8 +1827,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible and all_frozen until later, at the
 	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
 	 * freeze the page after pruning.  As long as we unset it before updating
-	 * the visibility map, this will be correct.
+	 * the visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v20-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch (37.7K, 12-v20-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 9aac6ebedbc68301ee8c3d6da8aef54838851c90 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v20 11/12] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c               |  2 +-
 src/backend/access/brin/brin.c                |  3 +-
 src/backend/access/gin/gininsert.c            |  3 +-
 src/backend/access/heap/heapam.c              | 15 +++-
 src/backend/access/heap/heapam_handler.c      | 22 ++++--
 src/backend/access/heap/pruneheap.c           | 69 +++++++++++++++----
 src/backend/access/index/genam.c              |  4 +-
 src/backend/access/index/indexam.c            |  6 +-
 src/backend/access/nbtree/nbtsort.c           |  2 +-
 src/backend/access/table/tableam.c            |  8 ++-
 src/backend/commands/constraint.c             |  2 +-
 src/backend/commands/copyto.c                 |  2 +-
 src/backend/commands/tablecmds.c              |  4 +-
 src/backend/commands/typecmds.c               |  4 +-
 src/backend/executor/execIndexing.c           |  2 +-
 src/backend/executor/execMain.c               |  4 ++
 src/backend/executor/execReplication.c        |  8 +--
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  9 ++-
 src/backend/executor/nodeIndexonlyscan.c      |  2 +-
 src/backend/executor/nodeIndexscan.c          | 11 ++-
 src/backend/executor/nodeSeqscan.c            | 26 ++++++-
 src/backend/partitioning/partbounds.c         |  2 +-
 src/backend/utils/adt/selfuncs.c              |  2 +-
 src/include/access/genam.h                    |  3 +-
 src/include/access/heapam.h                   | 30 +++++++-
 src/include/access/tableam.h                  | 19 ++---
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  3 +-
 29 files changed, 210 insertions(+), 65 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2f7d1437919..8186bba1d7e 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2828,7 +2828,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index c2b879b2bf6..147844690a1 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2bff37e03b5..ae53e311ce1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..27e3498f5f4 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,14 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0(sizeof(IndexFetchHeapData));
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
@@ -99,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -753,7 +762,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +771,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
@@ -2471,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index afb4251ad91..8011130ca8b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -197,7 +197,9 @@ static bool heap_page_will_set_vis(Relation relation,
 								   Buffer heap_buf,
 								   Buffer vmbuffer,
 								   bool blk_known_av,
-								   const PruneState *prstate,
+								   PruneReason reason,
+								   bool do_prune, bool do_freeze,
+								   PruneState *prstate,
 								   uint8 *vmflags,
 								   bool *do_set_pd_vis);
 
@@ -212,9 +214,13 @@ static bool heap_page_will_set_vis(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -291,6 +297,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 				.vistest = vistest,.cutoffs = NULL
 			};
 
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
 
@@ -781,6 +794,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * have examined this page’s VM bits (e.g., VACUUM in the previous
  * heap_vac_scan_next_block() call) and can pass that along.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with the desired
  * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
  * should be set on the heap page.
@@ -791,7 +807,9 @@ heap_page_will_set_vis(Relation relation,
 					   Buffer heap_buf,
 					   Buffer vmbuffer,
 					   bool blk_known_av,
-					   const PruneState *prstate,
+					   PruneReason reason,
+					   bool do_prune, bool do_freeze,
+					   PruneState *prstate,
 					   uint8 *vmflags,
 					   bool *do_set_pd_vis)
 {
@@ -807,6 +825,23 @@ heap_page_will_set_vis(Relation relation,
 		return false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+	{
+		prstate->all_visible = prstate->all_frozen = false;
+		return false;
+	}
+
 	if (prstate->all_visible && !PageIsAllVisible(heap_page))
 		*do_set_pd_vis = true;
 
@@ -830,6 +865,11 @@ heap_page_will_set_vis(Relation relation,
 	 * page-level bit is clear.  However, it's possible that in vacuum the bit
 	 * got cleared after heap_vac_scan_next_block() was called, so we must
 	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 *
+	 * This will never trigger for on-access pruning because it couldn't have
+	 * done a previous visibility map lookup and thus will always pass
+	 * blk_known_av as false. A future vacuum will have to take care of fixing
+	 * the corruption.
 	 */
 	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
 			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -991,6 +1031,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * Even if we don't prune anything, if we found a new value for the
+	 * pd_prune_xid field or the page was marked full, we will update the hint
+	 * bit.
+	 */
+	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+		PageIsFull(page);
+
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
@@ -1001,14 +1049,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
-	/*
-	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
-	 */
-	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
-		PageIsFull(page);
-
 	/*
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
@@ -1052,6 +1092,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	do_set_vm = heap_page_will_set_vis(params->relation,
 									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   params->reason, do_prune, do_freeze,
 									   &prstate, &new_vmbits, &do_set_pd_vis);
 
 	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -2338,7 +2379,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 
 /*
  * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
- * record.
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
  */
 static TransactionId
 get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
@@ -2408,8 +2449,8 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
  * all.
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 0cb27af1310..1e7992dbeb3 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
 	}
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 454adaee7dc..02ab0233e59 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 5e41404937e..558c4497993 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -50,6 +50,7 @@ char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
 
+
 /* ----------------------------------------------------------------------------
  * Slot functions.
  * ----------------------------------------------------------------------------
@@ -163,10 +164,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -217,7 +219,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..22b453dc617 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 23ebaa3f230..66c418059fe 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6345,7 +6345,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13730,7 +13730,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 5979580139f..35560ac60d9 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3154,7 +3154,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3235,7 +3235,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 401606f840a..4e39ac00f30 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..0630a5af79e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index def32774c90..473d236e551 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f464cca9507..87b04b1b88e 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f36929deec3..90f929ce741 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 8ba038c5ef4..d3b340ee2a7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3370,7 +3370,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cb23ad52782..78fa63e2b73 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -6788,7 +6788,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..2f9e9ea6318 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -204,6 +204,7 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2b6a521e4ea..1e3df54628b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,24 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -417,7 +440,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..0042636463f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the rel */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1154,9 +1157,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..0c3b0d60168 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v20-0012-Set-pd_prune_xid-on-insert.patch (6.7K, 13-v20-0012-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 80beb9d2f82b7b42fd162fbfacf065459afac578 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v20 12/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ae53e311ce1..f329f497480 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5ab46e8bf8f..dac640f5c9d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -462,6 +462,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -611,9 +617,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-11-19 09:35  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Kirill Reshke @ 2025-11-19 09:35 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Tue, 18 Nov 2025 at 04:07, Melanie Plageman
<[email protected]> wrote:
>
> Attached v20 has general cleanup, changes to the table/index AM
> callbacks detailed below, and it moves the
> heap_page_prune_and_freeze() refactoring commit down the stack to
> 0004.
>
> 0001 - 0003 are fairly trivial cleanup patches. I think they are ready
> to commit, so if I don't hear any objections in the next few days,
> I'll go ahead and commit them.
>


Hi! I looked up these 0002-0003 patches once again, LGTM. In
particular, I think 0002 & 0003 makes VM bits management more simple.
My only review comment is about 0003:
Should we make frz_conflict_horizon not a heap_page_will_freeze's
argument but rather just another field of  PruneState struct? If i'm
not mistaken, 'frz_conflict_horizon' fits good to be a part of pruning
state


-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-11-19 23:13  Melanie Plageman <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 2 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-11-19 23:13 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, Nov 19, 2025 at 4:35 AM Kirill Reshke <[email protected]> wrote:
>
> Hi! I looked up these 0002-0003 patches once again, LGTM. In
> particular, I think 0002 & 0003 makes VM bits management more simple.

Thanks for the review!

> My only review comment is about 0003:
> Should we make frz_conflict_horizon not a heap_page_will_freeze's
> argument but rather just another field of  PruneState struct? If i'm
> not mistaken, 'frz_conflict_horizon' fits good to be a part of pruning
> state

Since it is passed into one of the helpers, I think I agree. Attached
v21 has this change.

- Melanie


Attachments:

  [text/x-patch] v21-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patch (13.0K, 2-v21-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patch)
  download | inline diff:
From a87132c42cae9379cea52df91e10d8d5e2677e16 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 11:10:25 -0400
Subject: [PATCH v21 01/12] Refactor heap_page_prune_and_freeze() parameters
 into a struct
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

heap_page_prune_and_freeze() had accumulated an unwieldy number of input
parameters and upcoming work to handle VM updates in this function will
add even more.

Introduce a new PruneFreezeParams struct to group the function’s input
parameters, improving readability and maintainability.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Suggested-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
---
 src/backend/access/heap/pruneheap.c  | 91 +++++++++++++---------------
 src/backend/access/heap/vacuumlazy.c | 12 ++--
 src/include/access/heapam.h          | 63 +++++++++++++++----
 src/tools/pgindent/typedefs.list     |  1 +
 4 files changed, 100 insertions(+), 67 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 231bea679c6..e9e14cb42b7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -261,12 +261,18 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeResult presult;
 
 			/*
-			 * For now, pass mark_unused_now as false regardless of whether or
-			 * not the relation has indexes, since we cannot safely determine
-			 * that during on-access pruning with the current implementation.
+			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
+			 * regardless of whether or not the relation has indexes, since we
+			 * cannot safely determine that during on-access pruning with the
+			 * current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+			PruneFreezeParams params = {.relation = relation,.buffer = buffer,
+				.reason = PRUNE_ON_ACCESS,.options = 0,
+				.vistest = vistest,.cutoffs = NULL
+			};
+
+			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
+									   NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -419,60 +425,44 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
- * required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
- *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
  *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
- * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
- * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
+ * tuples if it's required in order to advance relfrozenxid / relminmxid, or
+ * if it's considered advantageous for overall system performance to do so
+ * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
+ * 'new_relmin_mxid' arguments are required when freezing.  When
+ * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
+ * and presult->all_frozen on exit, to indicate if the VM bits can be set.
+ * They are always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not
+ * passed, because at the moment only callers that also freeze need that
+ * information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
  * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set.  On entry, they contain the oldest XID and
- * multi-XID seen on the relation so far.  They will be updated with oldest
- * values present on the page after pruning.  After processing the whole
- * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
- * for the relation.
+ * HEAP_PAGE_PRUNE_FREEZE option is set in params.  On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far.  They will be updated
+ * with oldest values present on the page after pruning.  After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
  */
 void
-heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-						   GlobalVisState *vistest,
-						   int options,
-						   struct VacuumCutoffs *cutoffs,
+heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   PruneFreezeResult *presult,
-						   PruneReason reason,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
 {
+	Buffer		buffer = params->buffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	OffsetNumber offnum,
@@ -486,10 +476,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
 	/* Copy parameters to prstate */
-	prstate.vistest = vistest;
-	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
-	prstate.cutoffs = cutoffs;
+	prstate.vistest = params->vistest;
+	prstate.mark_unused_now =
+		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.cutoffs = params->cutoffs;
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -583,7 +574,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
 	maxoff = PageGetMaxOffsetNumber(page);
-	tup.t_tableOid = RelationGetRelid(relation);
+	tup.t_tableOid = RelationGetRelid(params->relation);
 
 	/*
 	 * Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -786,7 +777,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
 	 */
-	do_freeze = heap_page_will_freeze(relation, buffer,
+	do_freeze = heap_page_will_freeze(params->relation, buffer,
 									  did_tuple_hint_fpi,
 									  do_prune,
 									  do_hint_prune,
@@ -838,7 +829,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		/*
 		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
 		 */
-		if (RelationNeedsWAL(relation))
+		if (RelationNeedsWAL(params->relation))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -876,11 +867,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			else
 				conflict_xid = prstate.latest_xid_removed;
 
-			log_heap_prune_and_freeze(relation, buffer,
+			log_heap_prune_and_freeze(params->relation, buffer,
 									  InvalidBuffer,	/* vmbuffer */
 									  0,	/* vmflags */
 									  conflict_xid,
-									  true, reason,
+									  true, params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index deb9a3dc0d1..2b9e5c7f81b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1965,7 +1965,10 @@ lazy_scan_prune(LVRelState *vacrel,
 {
 	Relation	rel = vacrel->rel;
 	PruneFreezeResult presult;
-	int			prune_options = 0;
+	PruneFreezeParams params = {.relation = rel,.buffer = buf,
+		.reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE,
+		.cutoffs = &vacrel->cutoffs,.vistest = vacrel->vistest
+	};
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -1984,12 +1987,11 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
 	if (vacrel->nindexes == 0)
-		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
+		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+	heap_page_prune_and_freeze(&params,
+							   &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 909db73b7bb..632c4332a8c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -221,6 +221,56 @@ typedef struct HeapPageFreeze
 
 } HeapPageFreeze;
 
+
+/* 'reason' codes for heap_page_prune_and_freeze() */
+typedef enum
+{
+	PRUNE_ON_ACCESS,			/* on-access pruning */
+	PRUNE_VACUUM_SCAN,			/* VACUUM 1st heap pass */
+	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
+} PruneReason;
+
+/*
+ * Input parameters to heap_page_prune_and_freeze()
+ */
+typedef struct PruneFreezeParams
+{
+	Relation	relation;		/* relation containing buffer to be pruned */
+	Buffer		buffer;			/* buffer to be pruned */
+
+	/*
+	 * The reason pruning was performed.  It is used to set the WAL record
+	 * opcode which is used for debugging and analysis purposes.
+	 */
+	PruneReason reason;
+
+	/*
+	 * Contains flag bits:
+	 *
+	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
+	 * LP_UNUSED during pruning.
+	 *
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
+	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 */
+	int			options;
+
+	/*
+	 * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+	 * (see heap_prune_satisfies_vacuum).
+	 */
+	GlobalVisState *vistest;
+
+	/*
+	 * Contains the cutoffs used for freezing. They are required if the
+	 * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
+	 * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
+	 * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
+	 * calculates them once, at the beginning of vacuuming the relation.
+	 */
+	struct VacuumCutoffs *cutoffs;
+} PruneFreezeParams;
+
 /*
  * Per-page state returned by heap_page_prune_and_freeze()
  */
@@ -264,13 +314,6 @@ typedef struct PruneFreezeResult
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 } PruneFreezeResult;
 
-/* 'reason' codes for heap_page_prune_and_freeze() */
-typedef enum
-{
-	PRUNE_ON_ACCESS,			/* on-access pruning */
-	PRUNE_VACUUM_SCAN,			/* VACUUM 1st heap pass */
-	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
-} PruneReason;
 
 /* ----------------
  *		function prototypes for heap access method
@@ -367,12 +410,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-									   GlobalVisState *vistest,
-									   int options,
-									   struct VacuumCutoffs *cutoffs,
+extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
-									   PruneReason reason,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 57f2a9ccdc5..c751c25a04d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2348,6 +2348,7 @@ ProjectionPath
 PromptInterruptContext
 ProtocolVersion
 PrsStorage
+PruneFreezeParams
 PruneFreezeResult
 PruneReason
 PruneState
-- 
2.43.0



  [text/x-patch] v21-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patch (9.2K, 3-v21-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patch)
  download | inline diff:
From bedba753bb9fcc37b3b5f1a7e38c02828850520d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 14:55:40 -0400
Subject: [PATCH v21 02/12] Keep all_frozen updated in
 heap_page_prune_and_freeze

Previously, we relied on all_visible and all_frozen being used together
to ensure that all_frozen was correct, but it is better to keep both
fields updated.

Future changes will separate their usage, so we should not depend on
all_visible for the validity of all_frozen.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 60 +++++++++++++++-------------
 src/backend/access/heap/vacuumlazy.c |  9 ++---
 2 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e9e14cb42b7..7cd51c7be33 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
 	 * whether to freeze the page or not.  The all_visible and all_frozen
 	 * values returned to the caller are adjusted to include LP_DEAD items at
 	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -359,8 +355,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * anymore.  The opportunistic freeze heuristic must be improved;
 		 * however, for now, try to approximate the old logic.
 		 */
-		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		if (prstate->all_frozen && prstate->nfrozen > 0)
 		{
+			Assert(prstate->all_visible);
+
 			/*
 			 * Freezing would make the page all-frozen.  Have already emitted
 			 * an FPI or will do so anyway?
@@ -544,9 +542,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * dead tuples which are not yet removable.  However, dead tuples which
 	 * will be removed by the end of vacuuming should not preclude us from
 	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * all_visible and all_frozen when we see LP_DEAD items.  We fix that at
+	 * the end of the function, when we return the value to the caller, so
+	 * that the caller doesn't set the VM bits incorrectly.
 	 */
 	if (prstate.attempt_freeze)
 	{
@@ -783,6 +781,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  do_hint_prune,
 									  &prstate);
 
+	Assert(!prstate.all_frozen || prstate.all_visible);
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -852,7 +852,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			if (do_freeze)
 			{
-				if (prstate.all_visible && prstate.all_frozen)
+				if (prstate.all_frozen)
 					frz_conflict_horizon = prstate.visibility_cutoff_xid;
 				else
 				{
@@ -889,16 +889,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
 
 	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 * It was convenient to ignore LP_DEAD items in all_visible/all_frozen
+	 * earlier on to make the choice of whether or not to freeze the page
+	 * unaffected by the short-term presence of LP_DEAD items.  These LP_DEAD
+	 * items were effectively assumed to be LP_UNUSED items in the making.  It
+	 * doesn't matter which vacuum heap pass (initial pass or final pass) ends
+	 * up setting the page all-frozen, as long as the ongoing VACUUM does it.
 	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
+	 * Now that freezing has been finalized, unset all_visible and all_frozen
+	 * if there are any LP_DEAD items on the page.  It needs to reflect the
+	 * present state of the page, as expected by our caller.
 	 */
 	if (prstate.all_visible && prstate.lpdead_items == 0)
 	{
@@ -1289,8 +1289,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	prstate->ndead++;
 
 	/*
-	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Deliberately delay unsetting all_visible and all_frozen until later
+	 * during pruning. Removable dead tuples shouldn't preclude freezing the
+	 * page.
 	 */
 
 	/* Record the dead offset for vacuum */
@@ -1418,6 +1419,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = false;
+					prstate->all_frozen = false;
 					break;
 				}
 
@@ -1432,14 +1434,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				/*
 				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
-				 * non-freezing caller wanted to set the VM bit.
+				 * we only update 'all_visible' and 'all_frozen' when freezing
+				 * is requested. We could use GlobalVisTestIsRemovableXid
+				 * instead, if a non-freezing caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
 					prstate->all_visible = false;
+					prstate->all_frozen = false;
 					break;
 				}
 
@@ -1453,6 +1456,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
 			prstate->all_visible = false;
+			prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1472,6 +1476,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * does, so be consistent.
 			 */
 			prstate->all_visible = false;
+			prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1490,6 +1495,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 */
 			prstate->live_tuples++;
 			prstate->all_visible = false;
+			prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
@@ -1554,10 +1560,10 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
 	 * handled (handled here, or handled later on).
 	 *
-	 * Similarly, don't unset all_visible until later, at the end of
-	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
-	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * Similarly, don't unset all_visible and all_frozen until later, at the
+	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
+	 * freeze the page after pruning.  As long as we unset it before updating
+	 * the visibility map, this will be correct.
 	 */
 
 	/* Record the dead offset for vacuum */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2b9e5c7f81b..e1b7456823d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2017,7 +2017,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * agreement with heap_page_is_all_visible() using an assertion.
 	 */
 #ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
 	if (presult.all_visible)
 	{
 		TransactionId debug_cutoff;
@@ -2071,6 +2070,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
 	Assert(!presult.all_visible || !(*has_lpdead_items));
+	Assert(!presult.all_frozen || presult.all_visible);
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
@@ -2176,11 +2176,10 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
+	 * it as all-frozen.
 	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	else if (all_visible_according_to_vm && presult.all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		uint8		old_vmbits;
 
-- 
2.43.0



  [text/x-patch] v21-0003-Update-PruneState.all_-visible-frozen-earlier-in.patch (9.5K, 4-v21-0003-Update-PruneState.all_-visible-frozen-earlier-in.patch)
  download | inline diff:
From 021ad801205c44581c68d826a03e53ed678abdf0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:21:49 -0400
Subject: [PATCH v21 03/12] Update PruneState.all_[visible|frozen] earlier in
 pruning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items. This allows opportunistic
freezing if the page would otherwise be fully frozen, since those dead
items are later removed in vacuum’s third phase.

To move the VM update into the same WAL record that
prunes and freezes tuples, we must know whether the page will
be marked all-visible/all-frozen before emitting WAL.

The only barrier to updating these flags immediately after deciding
whether to opportunistically freeze is that we previously used
all_frozen to compute the snapshot conflict horizon when freezing
tuples. By determining the cutoff earlier, we can update the flags
immediately after making the freeze decision.

This is required to set the VM in the XLOG_HEAP2_PRUNE_VACUUM_SCAN
record emitted by pruning and freezing.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c | 116 ++++++++++++++--------------
 1 file changed, 58 insertions(+), 58 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7cd51c7be33..8e40565381f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,13 @@ typedef struct
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
 
+	/*
+	 * The snapshot conflict horizon used when freezing tuples. The final
+	 * snapshot conflict horizon for the record may be newer if pruning
+	 * removes newer transaction IDs.
+	 */
+	TransactionId frz_conflict_horizon;
+
 	/*
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
@@ -138,11 +145,11 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
+	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
+	 * That's convenient for heap_page_prune_and_freeze() to use them to
+	 * decide whether to freeze the page or not.  The all_visible and
+	 * all_frozen values returned to the caller are adjusted to include
+	 * LP_DEAD items after we determine whether to opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -388,6 +395,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * critical section.
 		 */
 		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+		/*
+		 * Calculate what the snapshot conflict horizon should be for a record
+		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+		 * for conflicts when the whole page is eligible to become all-frozen
+		 * in the VM once we're done with it. Otherwise, we generate a
+		 * conservative cutoff by stepping back from OldestXmin.
+		 */
+		if (prstate->all_frozen)
+			prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
+		else
+		{
+			/* Avoids false conflicts when hot_standby_feedback in use */
+			prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+			TransactionIdRetreat(prstate->frz_conflict_horizon);
+		}
 	}
 	else if (prstate->nfrozen > 0)
 	{
@@ -433,10 +456,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
  * 'new_relmin_mxid' arguments are required when freezing.  When
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen on exit, to indicate if the VM bits can be set.
- * They are always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not
- * passed, because at the moment only callers that also freeze need that
- * information.
+ * and presult->all_frozen after determining whether or not to
+ * opporunistically freeze, to indicate if the VM bits can be set.  They are
+ * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
+ * because at the moment only callers that also freeze need that information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -522,6 +545,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prstate.hastup = false;
 	prstate.lpdead_items = 0;
 	prstate.deadoffsets = presult->deadoffsets;
+	prstate.frz_conflict_horizon = InvalidTransactionId;
 
 	/*
 	 * Caller may update the VM after we're done.  We can keep track of
@@ -541,10 +565,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * are tuples present that are not visible to everyone or if there are
 	 * dead tuples which are not yet removable.  However, dead tuples which
 	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible and all_frozen when we see LP_DEAD items.  We fix that at
-	 * the end of the function, when we return the value to the caller, so
-	 * that the caller doesn't set the VM bits incorrectly.
+	 * opportunistically freezing.  Because of that, we do not immediately
+	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
+	 * that after scanning the line pointers, before we return the value to
+	 * the caller, so that the caller doesn't set the VM bits incorrectly.
 	 */
 	if (prstate.attempt_freeze)
 	{
@@ -781,6 +805,22 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  do_hint_prune,
 									  &prstate);
 
+	/*
+	 * While scanning the line pointers, we did not clear
+	 * all_visible/all_frozen when encountering LP_DEAD items because we
+	 * wanted the decision whether or not to freeze the page to be unaffected
+	 * by the short-term presence of LP_DEAD items.  These LP_DEAD items are
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that we finished determining whether or not to freeze the page,
+	 * update all_visible and all_frozen so that they reflect the true state
+	 * of the page for setting PD_ALL_VISIBLE and VM bits.
+	 */
+	if (prstate.lpdead_items > 0)
+		prstate.all_visible = prstate.all_frozen = false;
+
 	Assert(!prstate.all_frozen || prstate.all_visible);
 
 	/* Any error while applying the changes is critical */
@@ -841,29 +881,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
 			TransactionId conflict_xid;
 
-			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
-			 */
-			if (do_freeze)
-			{
-				if (prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
-			}
-
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
+			if (TransactionIdFollows(prstate.frz_conflict_horizon,
+									 prstate.latest_xid_removed))
+				conflict_xid = prstate.frz_conflict_horizon;
 			else
 				conflict_xid = prstate.latest_xid_removed;
 
@@ -887,30 +909,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible/all_frozen
-	 * earlier on to make the choice of whether or not to freeze the page
-	 * unaffected by the short-term presence of LP_DEAD items.  These LP_DEAD
-	 * items were effectively assumed to be LP_UNUSED items in the making.  It
-	 * doesn't matter which vacuum heap pass (initial pass or final pass) ends
-	 * up setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible and all_frozen
-	 * if there are any LP_DEAD items on the page.  It needs to reflect the
-	 * present state of the page, as expected by our caller.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
+	presult->all_visible = prstate.all_visible;
+	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
 	/*
-- 
2.43.0



  [text/x-patch] v21-0004-Split-heap_page_prune_and_freeze-into-helpers.patch (25.4K, 5-v21-0004-Split-heap_page_prune_and_freeze-into-helpers.patch)
  download | inline diff:
From fd070d6954e5156523dafe35392654453c1d8684 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 17 Nov 2025 15:11:27 -0500
Subject: [PATCH v21 04/12] Split heap_page_prune_and_freeze() into helpers

Refactor the setup and planning phases of pruning and freezing into
helpers. This streamlines heap_page_prune_and_freeze() and makes it more
clear when the examination of tuples ends and page modifications begin.
---
 src/backend/access/heap/pruneheap.c | 559 +++++++++++++++-------------
 1 file changed, 307 insertions(+), 252 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8e40565381f..b10c5eb1163 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -157,6 +157,14 @@ typedef struct
 } PruneState;
 
 /* Local functions */
+static void prune_freeze_setup(PruneFreezeParams *params,
+							   TransactionId new_relfrozen_xid,
+							   MultiXactId new_relmin_mxid,
+							   const PruneFreezeResult *presult,
+							   PruneState *prstate);
+static void prune_freeze_plan(Oid reloid, Buffer buffer,
+							  PruneState *prstate,
+							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 											   HeapTuple tup,
 											   Buffer buffer);
@@ -308,200 +316,22 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 }
 
 /*
- * Decide whether to proceed with freezing according to the freeze plans
- * prepared for the given heap buffer. If freezing is chosen, this function
- * performs several pre-freeze checks.
- *
- * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
- * determined before calling this function.
- *
- * prstate is both an input and output parameter.
- *
- * Returns true if we should apply the freeze plans and freeze tuples on the
- * page, and false otherwise.
+ * Helper for heap_page_prune_and_freeze() to initialize the PruneState using
+ * the provided parameters.
  */
-static bool
-heap_page_will_freeze(Relation relation, Buffer buffer,
-					  bool did_tuple_hint_fpi,
-					  bool do_prune,
-					  bool do_hint_prune,
-					  PruneState *prstate)
-{
-	bool		do_freeze = false;
-
-	/*
-	 * If the caller specified we should not attempt to freeze any tuples,
-	 * validate that everything is in the right state and return.
-	 */
-	if (!prstate->attempt_freeze)
-	{
-		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
-		return false;
-	}
-
-	if (prstate->pagefrz.freeze_required)
-	{
-		/*
-		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
-		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
-		 * advance relfrozenxid/relminmxid.
-		 */
-		do_freeze = true;
-	}
-	else
-	{
-		/*
-		 * Opportunistically freeze the page if we are generating an FPI
-		 * anyway and if doing so means that we can set the page all-frozen
-		 * afterwards (might not happen until VACUUM's final heap pass).
-		 *
-		 * XXX: Previously, we knew if pruning emitted an FPI by checking
-		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
-		 * prune records were combined, this heuristic couldn't be used
-		 * anymore.  The opportunistic freeze heuristic must be improved;
-		 * however, for now, try to approximate the old logic.
-		 */
-		if (prstate->all_frozen && prstate->nfrozen > 0)
-		{
-			Assert(prstate->all_visible);
-
-			/*
-			 * Freezing would make the page all-frozen.  Have already emitted
-			 * an FPI or will do so anyway?
-			 */
-			if (RelationNeedsWAL(relation))
-			{
-				if (did_tuple_hint_fpi)
-					do_freeze = true;
-				else if (do_prune)
-				{
-					if (XLogCheckBufferNeedsBackup(buffer))
-						do_freeze = true;
-				}
-				else if (do_hint_prune)
-				{
-					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-						do_freeze = true;
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
-
-		/*
-		 * Calculate what the snapshot conflict horizon should be for a record
-		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
-		 * for conflicts when the whole page is eligible to become all-frozen
-		 * in the VM once we're done with it. Otherwise, we generate a
-		 * conservative cutoff by stepping back from OldestXmin.
-		 */
-		if (prstate->all_frozen)
-			prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
-		else
-		{
-			/* Avoids false conflicts when hot_standby_feedback in use */
-			prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
-			TransactionIdRetreat(prstate->frz_conflict_horizon);
-		}
-	}
-	else if (prstate->nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate->pagefrz.freeze_required);
-
-		prstate->all_frozen = false;
-		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
-
-	return do_freeze;
-}
-
-
-/*
- * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
- *
- * Caller must have pin and buffer cleanup lock on the page.  Note that we
- * don't update the FSM information for page on caller's behalf.  Caller might
- * also need to account for a reduction in the length of the line pointer
- * array following array truncation by us.
- *
- * params contains the input parameters used to control freezing and pruning
- * behavior. See the definition of PruneFreezeParams for more on what each
- * parameter does.
- *
- * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
- * tuples if it's required in order to advance relfrozenxid / relminmxid, or
- * if it's considered advantageous for overall system performance to do so
- * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opporunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
- *
- * presult contains output parameters needed by callers, such as the number of
- * tuples removed and the offsets of dead items on the page after pruning.
- * heap_page_prune_and_freeze() is responsible for initializing it.  Required
- * by all callers.
- *
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
- * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PAGE_PRUNE_FREEZE option is set in params.  On entry, they contain the
- * oldest XID and multi-XID seen on the relation so far.  They will be updated
- * with oldest values present on the page after pruning.  After processing the
- * whole relation, VACUUM can use these values as the new
- * relfrozenxid/relminmxid for the relation.
- */
-void
-heap_page_prune_and_freeze(PruneFreezeParams *params,
-						   PruneFreezeResult *presult,
-						   OffsetNumber *off_loc,
-						   TransactionId *new_relfrozen_xid,
-						   MultiXactId *new_relmin_mxid)
+static void
+prune_freeze_setup(PruneFreezeParams *params,
+				   TransactionId new_relfrozen_xid,
+				   MultiXactId new_relmin_mxid,
+				   const PruneFreezeResult *presult,
+				   PruneState *prstate)
 {
-	Buffer		buffer = params->buffer;
-	Page		page = BufferGetPage(buffer);
-	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	OffsetNumber offnum,
-				maxoff;
-	PruneState	prstate;
-	HeapTupleData tup;
-	bool		do_freeze;
-	bool		do_prune;
-	bool		do_hint_prune;
-	bool		did_tuple_hint_fpi;
-	int64		fpi_before = pgWalUsage.wal_fpi;
-
 	/* Copy parameters to prstate */
-	prstate.vistest = params->vistest;
-	prstate.mark_unused_now =
+	prstate->vistest = params->vistest;
+	prstate->mark_unused_now =
 		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
-	prstate.cutoffs = params->cutoffs;
+	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->cutoffs = params->cutoffs;
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -514,41 +344,42 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * prunable, we will save the lowest relevant XID in new_prune_xid. Also
 	 * initialize the rest of our working state.
 	 */
-	prstate.new_prune_xid = InvalidTransactionId;
-	prstate.latest_xid_removed = InvalidTransactionId;
-	prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
-	prstate.nroot_items = 0;
-	prstate.nheaponly_items = 0;
+	prstate->new_prune_xid = InvalidTransactionId;
+	prstate->latest_xid_removed = InvalidTransactionId;
+	prstate->nredirected = prstate->ndead = prstate->nunused = 0;
+	prstate->nfrozen = 0;
+	prstate->nroot_items = 0;
+	prstate->nheaponly_items = 0;
 
 	/* initialize page freezing working state */
-	prstate.pagefrz.freeze_required = false;
-	if (prstate.attempt_freeze)
+	prstate->pagefrz.freeze_required = false;
+	if (prstate->attempt_freeze)
 	{
-		Assert(new_relfrozen_xid && new_relmin_mxid);
-		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
-		prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
-		prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
-		prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+		prstate->pagefrz.FreezePageRelfrozenXid = new_relfrozen_xid;
+		prstate->pagefrz.NoFreezePageRelfrozenXid = new_relfrozen_xid;
+		prstate->pagefrz.FreezePageRelminMxid = new_relmin_mxid;
+		prstate->pagefrz.NoFreezePageRelminMxid = new_relmin_mxid;
 	}
 	else
 	{
-		Assert(new_relfrozen_xid == NULL && new_relmin_mxid == NULL);
-		prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
-		prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
-		prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
-		prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+		Assert(new_relfrozen_xid == InvalidTransactionId &&
+			   new_relmin_mxid == InvalidMultiXactId);
+		prstate->pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+		prstate->pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+		prstate->pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+		prstate->pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
 	}
 
-	prstate.ndeleted = 0;
-	prstate.live_tuples = 0;
-	prstate.recently_dead_tuples = 0;
-	prstate.hastup = false;
-	prstate.lpdead_items = 0;
-	prstate.deadoffsets = presult->deadoffsets;
-	prstate.frz_conflict_horizon = InvalidTransactionId;
+	prstate->ndeleted = 0;
+	prstate->live_tuples = 0;
+	prstate->recently_dead_tuples = 0;
+	prstate->hastup = false;
+	prstate->lpdead_items = 0;
+	prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
+	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
+	 * Vacuum may update the VM after we're done.  We can keep track of
 	 * whether the page will be all-visible and all-frozen after pruning and
 	 * freezing to help the caller to do that.
 	 *
@@ -570,10 +401,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * that after scanning the line pointers, before we return the value to
 	 * the caller, so that the caller doesn't set the VM bits incorrectly.
 	 */
-	if (prstate.attempt_freeze)
+	if (prstate->attempt_freeze)
 	{
-		prstate.all_visible = true;
-		prstate.all_frozen = true;
+		prstate->all_visible = true;
+		prstate->all_frozen = true;
 	}
 	else
 	{
@@ -581,8 +412,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * Initializing to false allows skipping the work to update them in
 		 * heap_prune_record_unchanged_lp_normal().
 		 */
-		prstate.all_visible = false;
-		prstate.all_frozen = false;
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
 	}
 
 	/*
@@ -593,10 +424,29 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * running transaction on the standby does not see tuples on the page as
 	 * all-visible, so the conflict horizon remains InvalidTransactionId.
 	 */
-	prstate.visibility_cutoff_xid = InvalidTransactionId;
+	prstate->visibility_cutoff_xid = InvalidTransactionId;
+}
 
-	maxoff = PageGetMaxOffsetNumber(page);
-	tup.t_tableOid = RelationGetRelid(params->relation);
+/*
+ * Helper for heap_page_prune_and_freeze(). Iterates over every tuple on the
+ * page, examines its visibility information, and determines the appropriate
+ * action for each tuple. All tuples are processed and classified during this
+ * phase, but no modifications are made to the page until the later execution
+ * stage.
+ *
+ * *off_loc is used for error callback and cleared before returning.
+ */
+static void
+prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
+				  OffsetNumber *off_loc)
+{
+	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+	OffsetNumber offnum;
+	HeapTupleData tup;
+
+	tup.t_tableOid = reloid;
 
 	/*
 	 * Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -631,13 +481,13 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		*off_loc = offnum;
 
-		prstate.processed[offnum] = false;
-		prstate.htsv[offnum] = -1;
+		prstate->processed[offnum] = false;
+		prstate->htsv[offnum] = -1;
 
 		/* Nothing to do if slot doesn't contain a tuple */
 		if (!ItemIdIsUsed(itemid))
 		{
-			heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
+			heap_prune_record_unchanged_lp_unused(page, prstate, offnum);
 			continue;
 		}
 
@@ -647,17 +497,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * If the caller set mark_unused_now true, we can set dead line
 			 * pointers LP_UNUSED now.
 			 */
-			if (unlikely(prstate.mark_unused_now))
-				heap_prune_record_unused(&prstate, offnum, false);
+			if (unlikely(prstate->mark_unused_now))
+				heap_prune_record_unused(prstate, offnum, false);
 			else
-				heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
+				heap_prune_record_unchanged_lp_dead(page, prstate, offnum);
 			continue;
 		}
 
 		if (ItemIdIsRedirected(itemid))
 		{
 			/* This is the start of a HOT chain */
-			prstate.root_items[prstate.nroot_items++] = offnum;
+			prstate->root_items[prstate->nroot_items++] = offnum;
 			continue;
 		}
 
@@ -671,21 +521,15 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		tup.t_len = ItemIdGetLength(itemid);
 		ItemPointerSet(&tup.t_self, blockno, offnum);
 
-		prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
-														   buffer);
+		prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, &tup,
+															buffer);
 
 		if (!HeapTupleHeaderIsHeapOnly(htup))
-			prstate.root_items[prstate.nroot_items++] = offnum;
+			prstate->root_items[prstate->nroot_items++] = offnum;
 		else
-			prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
+			prstate->heaponly_items[prstate->nheaponly_items++] = offnum;
 	}
 
-	/*
-	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
-	 * an FPI to be emitted.
-	 */
-	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
-
 	/*
 	 * Process HOT chains.
 	 *
@@ -697,30 +541,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * the page instead of using the root_items array, also did it in
 	 * ascending offset number order.)
 	 */
-	for (int i = prstate.nroot_items - 1; i >= 0; i--)
+	for (int i = prstate->nroot_items - 1; i >= 0; i--)
 	{
-		offnum = prstate.root_items[i];
+		offnum = prstate->root_items[i];
 
 		/* Ignore items already processed as part of an earlier chain */
-		if (prstate.processed[offnum])
+		if (prstate->processed[offnum])
 			continue;
 
 		/* see preceding loop */
 		*off_loc = offnum;
 
 		/* Process this item or chain of items */
-		heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
+		heap_prune_chain(page, blockno, maxoff, offnum, prstate);
 	}
 
 	/*
 	 * Process any heap-only tuples that were not already processed as part of
 	 * a HOT chain.
 	 */
-	for (int i = prstate.nheaponly_items - 1; i >= 0; i--)
+	for (int i = prstate->nheaponly_items - 1; i >= 0; i--)
 	{
-		offnum = prstate.heaponly_items[i];
+		offnum = prstate->heaponly_items[i];
 
-		if (prstate.processed[offnum])
+		if (prstate->processed[offnum])
 			continue;
 
 		/* see preceding loop */
@@ -739,7 +583,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * return true for an XMIN_INVALID tuple, so this code will work even
 		 * when there were sequential updates within the aborted transaction.)
 		 */
-		if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
+		if (prstate->htsv[offnum] == HEAPTUPLE_DEAD)
 		{
 			ItemId		itemid = PageGetItemId(page, offnum);
 			HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -747,8 +591,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
 			{
 				HeapTupleHeaderAdvanceConflictHorizon(htup,
-													  &prstate.latest_xid_removed);
-				heap_prune_record_unused(&prstate, offnum, true);
+													  &prstate->latest_xid_removed);
+				heap_prune_record_unused(prstate, offnum, true);
 			}
 			else
 			{
@@ -765,7 +609,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			}
 		}
 		else
-			heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
+			heap_prune_record_unchanged_lp_normal(page, prstate, offnum);
 	}
 
 	/* We should now have processed every tuple exactly once  */
@@ -776,12 +620,223 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	{
 		*off_loc = offnum;
 
-		Assert(prstate.processed[offnum]);
+		Assert(prstate->processed[offnum]);
 	}
 #endif
 
 	/* Clear the offset information once we have processed the given page. */
 	*off_loc = InvalidOffsetNumber;
+}
+
+/*
+ * Decide whether to proceed with freezing according to the freeze plans
+ * prepared for the given heap buffer. If freezing is chosen, this function
+ * performs several pre-freeze checks.
+ *
+ * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
+ * determined before calling this function.
+ *
+ * prstate is both an input and output parameter.
+ *
+ * Returns true if we should apply the freeze plans and freeze tuples on the
+ * page, and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool did_tuple_hint_fpi,
+					  bool do_prune,
+					  bool do_hint_prune,
+					  PruneState *prstate)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and return.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			Assert(prstate->all_visible);
+
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_prune)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+		/*
+		 * Calculate what the snapshot conflict horizon should be for a record
+		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+		 * for conflicts when the whole page is eligible to become all-frozen
+		 * in the VM once we're done with it. Otherwise, we generate a
+		 * conservative cutoff by stepping back from OldestXmin.
+		 */
+		if (prstate->all_frozen)
+			prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
+		else
+		{
+			/* Avoids false conflicts when hot_standby_feedback in use */
+			prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+			TransactionIdRetreat(prstate->frz_conflict_horizon);
+		}
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	return do_freeze;
+}
+
+
+/*
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
+ *
+ * Caller must have pin and buffer cleanup lock on the page.  Note that we
+ * don't update the FSM information for page on caller's behalf.  Caller might
+ * also need to account for a reduction in the length of the line pointer
+ * array following array truncation by us.
+ *
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
+ *
+ * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
+ * tuples if it's required in order to advance relfrozenxid / relminmxid, or
+ * if it's considered advantageous for overall system performance to do so
+ * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
+ * 'new_relmin_mxid' arguments are required when freezing.  When
+ * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
+ * and presult->all_frozen after determining whether or not to
+ * opporunistically freeze, to indicate if the VM bits can be set.  They are
+ * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
+ * because at the moment only callers that also freeze need that information.
+ *
+ * presult contains output parameters needed by callers, such as the number of
+ * tuples removed and the offsets of dead items on the page after pruning.
+ * heap_page_prune_and_freeze() is responsible for initializing it.  Required
+ * by all callers.
+ *
+ * off_loc is the offset location required by the caller to use in error
+ * callback.
+ *
+ * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
+ * HEAP_PAGE_PRUNE_FREEZE option is set in params.  On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far.  They will be updated
+ * with oldest values present on the page after pruning.  After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
+ */
+void
+heap_page_prune_and_freeze(PruneFreezeParams *params,
+						   PruneFreezeResult *presult,
+						   OffsetNumber *off_loc,
+						   TransactionId *new_relfrozen_xid,
+						   MultiXactId *new_relmin_mxid)
+{
+	Buffer		buffer = params->buffer;
+	Page		page = BufferGetPage(buffer);
+	PruneState	prstate;
+	bool		do_freeze;
+	bool		do_prune;
+	bool		do_hint_prune;
+	bool		did_tuple_hint_fpi;
+	int64		fpi_before = pgWalUsage.wal_fpi;
+
+	/* Initialize prstate */
+	prune_freeze_setup(params,
+					   new_relfrozen_xid ?
+					   *new_relfrozen_xid : InvalidTransactionId,
+					   new_relmin_mxid ?
+					   *new_relmin_mxid : InvalidMultiXactId,
+					   presult,
+					   &prstate);
+
+	/*
+	 * Examine all line pointers and tuple visibility information to determine
+	 * which line pointers should change state and which tuples may be frozen.
+	 * Prepare queue of state changes to later be executed in a critical
+	 * section.
+	 */
+	prune_freeze_plan(RelationGetRelid(params->relation),
+					  buffer, &prstate, off_loc);
+
+	/*
+	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
+	 * checking tuple visibility information in prune_freeze_plan() may have
+	 * caused an FPI to be emitted.
+	 */
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	do_prune = prstate.nredirected > 0 ||
 		prstate.ndead > 0 ||
-- 
2.43.0



  [text/x-patch] v21-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (43.4K, 6-v21-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From a9af84665ae761e9fba46f835a5efd849739da23 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 8 Oct 2025 15:39:01 -0400
Subject: [PATCH v21 05/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c |  37 ++-
 src/backend/access/heap/pruneheap.c   | 461 +++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c  | 241 +-------------
 src/include/access/heapam.h           |  43 ++-
 4 files changed, 447 insertions(+), 335 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11cb3f74da5..2af724451c3 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -104,6 +104,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		OffsetNumber *frz_offsets;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 		bool		do_prune;
+		bool		set_lsn = false;
+		bool		mark_buffer_dirty = false;
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
 											   &nplans, &plans, &frz_offsets,
@@ -157,17 +159,36 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
-		if (vmflags & VISIBILITYMAP_VALID_BITS)
-			PageSetAllVisible(page);
-
-		MarkBufferDirty(buffer);
+		if (do_prune || nplans > 0)
+			mark_buffer_dirty = set_lsn = true;
 
 		/*
-		 * See log_heap_prune_and_freeze() for commentary on when we set the
-		 * heap page LSN.
+		 * The critical integrity requirement here is that we must never end
+		 * up with with the visibility map bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear.  If that were to occur, a subsequent page
+		 * modification would fail to clear the visibility map bit.
+		 *
+		 * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
+		 * marking an all-visible page all-frozen). If only the VM is updated,
+		 * the heap page need not be dirtied.
 		 */
-		if (do_prune || nplans > 0 ||
-			((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			mark_buffer_dirty = true;
+
+			/*
+			 * See log_heap_prune_and_freeze() for commentary on when we set
+			 * the heap page LSN.
+			 */
+			if (XLogHintBitIsNeeded())
+				set_lsn = true;
+		}
+
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+
+		if (set_lsn)
 			PageSetLSN(page, lsn);
 
 		/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b10c5eb1163..ba578c1ce0f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -188,10 +191,21 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid, bool blk_already_av,
+									  bool set_blk_all_frozen);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-
+static bool heap_page_will_set_vis(Relation relation,
+								   BlockNumber heap_blk,
+								   Buffer heap_buf,
+								   Buffer vmbuffer,
+								   bool blk_known_av,
+								   const PruneState *prstate,
+								   uint8 *vmflags,
+								   bool *do_set_pd_vis);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -278,6 +292,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * current implementation.
 			 */
 			PruneFreezeParams params = {.relation = relation,.buffer = buffer,
+				.vmbuffer = InvalidBuffer,.blk_known_av = false,
 				.reason = PRUNE_ON_ACCESS,.options = 0,
 				.vistest = vistest,.cutoffs = NULL
 			};
@@ -331,6 +346,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->mark_unused_now =
 		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
 	prstate->cutoffs = params->cutoffs;
 
 	/*
@@ -379,50 +396,54 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
+	 *
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
-	 * that after scanning the line pointers, before we return the value to
-	 * the caller, so that the caller doesn't set the VM bits incorrectly.
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+	 * them after deciding whether to freeze, but before updating the VM, to
+	 * avoid setting the VM bits incorrectly.
+	 *
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate->attempt_freeze)
 	{
 		prstate->all_visible = true;
 		prstate->all_frozen = true;
 	}
+	else if (prstate->attempt_update_vm)
+	{
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate->all_visible = false;
 		prstate->all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -757,10 +778,133 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneState and blk_known_av. Some callers may already
+ * have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
+ * should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+					   BlockNumber heap_blk,
+					   Buffer heap_buf,
+					   Buffer vmbuffer,
+					   bool blk_known_av,
+					   const PruneState *prstate,
+					   uint8 *vmflags,
+					   bool *do_set_pd_vis)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+	bool		do_set_vm = false;
+
+	*do_set_pd_vis = false;
+
+	if (!prstate->attempt_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	if (prstate->all_visible && !PageIsAllVisible(heap_page))
+		*do_set_pd_vis = true;
+
+	if ((prstate->all_visible && !blk_known_av) ||
+		(prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+		do_set_vm = true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that in vacuum the bit
+	 * got cleared after heap_vac_scan_next_block() was called, so we must
+	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return do_set_vm;
+}
+
+
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+#endif
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -775,12 +919,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opporunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -805,13 +950,20 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
+	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -877,6 +1029,34 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+	 * based on information from the VM and the all_visible/all_frozen flags.
+	 *
+	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+	 * VM bit is clear, we strongly prefer to keep them in sync.
+	 *
+	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+	 * already been set. Setting only the VM is most common when setting an
+	 * already all-visible page all-frozen.
+	 */
+	do_set_vm = heap_page_will_set_vis(params->relation,
+									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   &prstate, &new_vmbits, &do_set_pd_vis);
+
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									prstate.latest_xid_removed, prstate.frz_conflict_horizon,
+									prstate.visibility_cutoff_xid, params->blk_known_av,
+									(do_set_vm && (new_vmbits & VISIBILITYMAP_ALL_FROZEN)));
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -898,14 +1078,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_pd_vis)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -919,36 +1102,43 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (do_set_pd_vis)
+			PageSetAllVisible(page);
 
-		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
-		 */
-		if (RelationNeedsWAL(params->relation))
+		if (do_prune || do_freeze || do_set_pd_vis)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
+			Assert(PageIsAllVisible(page));
 
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
+			old_vmbits = visibilitymap_set_vmbits(blockno,
+												  vmbuffer, new_vmbits,
+												  params->relation->rd_locator);
+			if (old_vmbits == new_vmbits)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				/* Unset so we don't emit WAL since no change occurred */
+				do_set_vm = false;
+			}
+		}
 
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+		 * only updating the VM and it turns out it was already set, we will
+		 * have unset do_set_vm earlier. As such, check it again before
+		 * emitting the record.
+		 */
+		if (RelationNeedsWAL(params->relation) &&
+			(do_prune || do_freeze || do_set_vm))
+		{
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  do_set_pd_vis,
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -958,28 +1148,47 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
+
+		if (!heap_page_is_all_visible(params->relation, buffer,
+									  prstate.cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1471,6 +1680,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			{
 				TransactionId xmin;
 
+				Assert(prstate->attempt_update_vm);
+
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = false;
@@ -2121,6 +2332,65 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 	return nplans;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid, bool blk_already_av,
+				 bool set_blk_all_frozen)
+{
+	TransactionId conflict_xid;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune && !do_freeze &&
+		do_set_vm && blk_already_av && set_blk_all_frozen)
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	return conflict_xid;
+}
+
 /*
  * Write an XLOG_HEAP2_PRUNE* WAL record
  *
@@ -2145,6 +2415,15 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
+ * In some cases, such as when heap_page_prune_and_freeze() is setting an
+ * already marked all-visible page all-frozen, PD_ALL_VISIBLE may already be
+ * set. So, it is possible for vmflags to be non-zero and set_pd_all_vis to be
+ * false.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
@@ -2154,6 +2433,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 						  Buffer vmbuffer, uint8 vmflags,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
+						  bool set_pd_all_vis,
 						  PruneReason reason,
 						  HeapTupleFreeze *frozen, int nfrozen,
 						  OffsetNumber *redirected, int nredirected,
@@ -2190,7 +2470,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 */
 	if (!do_prune &&
 		nfrozen == 0 &&
-		(!do_set_vm || !XLogHintBitIsNeeded()))
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags_heap |= REGBUF_NO_IMAGE;
 
 	/*
@@ -2308,7 +2588,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * See comment at the top of the function about regbuf_flags_heap for
 	 * details on when we can advance the page LSN.
 	 */
-	if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
 	{
 		Assert(BufferIsDirty(buffer));
 		PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e1b7456823d..a7a974b6639 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,20 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
-										   OffsetNumber *deadoffsets,
-										   int ndeadoffsets,
-										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
-										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -1966,7 +1952,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	Relation	rel = vacrel->rel;
 	PruneFreezeResult presult;
 	PruneFreezeParams params = {.relation = rel,.buffer = buf,
-		.reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE,
+		.vmbuffer = vmbuffer,.blk_known_av = all_visible_according_to_vm,
+		.reason = PRUNE_VACUUM_SCAN,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS,
 		.cutoffs = &vacrel->cutoffs,.vistest = vacrel->vistest
 	};
 
@@ -2009,33 +1997,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2069,168 +2030,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
 	return presult.ndeleted;
@@ -2952,6 +2771,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  vmflags,
 								  conflict_xid,
 								  false,	/* no cleanup lock required */
+								  (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
@@ -3632,30 +3452,6 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
-{
-
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
-										  NULL, 0,
-										  all_frozen,
-										  visibility_cutoff_xid,
-										  logging_offnum);
-}
-#endif
-
 /*
  * Check whether the heap page in buf is all-visible except for the dead
  * tuples referenced in the deadoffsets array.
@@ -3678,15 +3474,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
  * This logic is closely related to heap_prune_record_unchanged_lp_normal().
  * If you modify this function, ensure consistency with that code. An
  * assertion cross-checks that both remain in agreement. Do not introduce new
  * side-effects.
  */
-static bool
+bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..937b46a77db 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 *
+	 * vmbuffer is the buffer that must already contain contain the required
+	 * block of the visibility map if we are to update it. blk_known_av is the
+	 * visibility status of the heap block as of the last call to
+	 * find_next_unskippable_block().
+	 */
+	Buffer		vmbuffer;
+	bool		blk_known_av;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -250,8 +261,10 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
+	 *
+	 * HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status
+	 * in the VM.
 	 */
 	int			options;
 
@@ -285,19 +298,15 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+	 * we have attempted to update the VM.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
+	uint8		new_vmbits;
+	uint8		old_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -424,6 +433,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 									  Buffer vmbuffer, uint8 vmflags,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
+									  bool set_pd_all_vis,
 									  PruneReason reason,
 									  HeapTupleFreeze *frozen, int nfrozen,
 									  OffsetNumber *redirected, int nredirected,
@@ -433,6 +443,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v21-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.5K, 7-v21-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From e117e20aebcbc4b3bfe5b077d9f122e171a8c6fe Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v21 06/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a7a974b6639..fa7be0f857f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1872,9 +1872,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1891,13 +1894,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  true, /* set_pd_all_vis */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v21-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (26.2K, 8-v21-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 1d3f9e8f397508808c01bcc827294014eac5b19b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v21 07/12] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |  18 ++-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 112 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  28 +---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 54 insertions(+), 379 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4b0c49f4bb0..2bff37e03b5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8797,50 +8797,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2af724451c3..5ab46e8bf8f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -251,7 +251,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -264,142 +264,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -777,8 +641,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -790,11 +654,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1375,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ba578c1ce0f..80037d690e3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1112,9 +1112,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		{
 			Assert(PageIsAllVisible(page));
 
-			old_vmbits = visibilitymap_set_vmbits(blockno,
-												  vmbuffer, new_vmbits,
-												  params->relation->rd_locator);
+			old_vmbits = visibilitymap_set(blockno,
+										   vmbuffer, new_vmbits,
+										   params->relation->rd_locator);
 			if (old_vmbits == new_vmbits)
 			{
 				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
@@ -2396,14 +2396,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ *   all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fa7be0f857f..fd68dfcfce2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1894,11 +1894,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2776,9 +2776,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 * set PD_ALL_VISIBLE.
 		 */
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..e9e77bd678b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c751c25a04d..2a9951b7188 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4292,7 +4292,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v21-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (8.0K, 9-v21-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From 2029bdec49e880e8d3453cd7a2246a93e69b867d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v21 08/12] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.

The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 12 ++++++------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 17 ++++++++---------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 80037d690e3..989af765702 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -250,7 +250,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -476,7 +476,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
+	 * checked item causes GlobalVisFullXidVisibleToAll() to update the
 	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
 	 * transaction aborts.
 	 *
@@ -1238,11 +1238,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1701,7 +1701,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				/*
 				 * For now always use prstate->cutoffs for this test, because
 				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisTestIsRemovableXid
+				 * is requested. We could use GlobalVisXidVisibleToAll()
 				 * instead, if a non-freezing caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 71ef2e5036f..1c0eb425ee9 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 }
 
 /*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
  *
  * It is crucial that this only gets called for xids from a source that
  * protects against xid wraparounds (e.g. from a table and thus protected by
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
  */
 bool
 GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v21-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (10.5K, 10-v21-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 2234e4bc98c173d740c27aa55347e92baec3e6d3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v21 09/12] Use GlobalVisState in vacuum to determine page
 level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++
 src/backend/access/heap/pruneheap.c         | 43 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 10 ++---
 src/include/access/heapam.h                 | 11 +++---
 4 files changed, 58 insertions(+), 34 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 989af765702..040efe80f2e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -439,11 +439,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -886,14 +887,13 @@ heap_page_will_set_vis(Relation relation,
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -994,6 +994,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update the hint
@@ -1165,10 +1175,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		if (!heap_page_is_all_visible(params->relation, buffer,
-									  prstate.cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1698,20 +1707,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisXidVisibleToAll()
-				 * instead, if a non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fd68dfcfce2..fdf37625cd0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2730,7 +2730,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3489,7 +3489,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3505,7 +3505,7 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3579,7 +3579,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 		/* Visibility checks may do IO or allocate memory */
 		Assert(CritSectionCount == 0);
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3598,7 +3598,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 937b46a77db..2b6a521e4ea 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -276,10 +276,9 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Contains the cutoffs used for freezing. They are required if the
-	 * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
-	 * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
-	 * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
-	 * calculates them once, at the beginning of vacuuming the relation.
+	 * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+	 * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+	 * relation.
 	 */
 	struct VacuumCutoffs *cutoffs;
 } PruneFreezeParams;
@@ -444,7 +443,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -458,6 +457,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v21-0010-Unset-all_visible-sooner-if-not-freezing.patch (2.4K, 11-v21-0010-Unset-all_visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 58242c95a3e737f3659913f24b22219dbafe1951 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v21 10/12] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
---
 src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 040efe80f2e..90270081acd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1564,8 +1564,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	/*
 	 * Deliberately delay unsetting all_visible and all_frozen until later
 	 * during pruning. Removable dead tuples shouldn't preclude freezing the
-	 * page.
+	 * page. If we won't attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1824,8 +1829,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible and all_frozen until later, at the
 	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
 	 * freeze the page after pruning.  As long as we unset it before updating
-	 * the visibility map, this will be correct.
+	 * the visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v21-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch (37.7K, 12-v21-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From f962eee2760f7f0927a318ac05b55e48eea3cec0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v21 11/12] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c               |  2 +-
 src/backend/access/brin/brin.c                |  3 +-
 src/backend/access/gin/gininsert.c            |  3 +-
 src/backend/access/heap/heapam.c              | 15 +++-
 src/backend/access/heap/heapam_handler.c      | 22 ++++--
 src/backend/access/heap/pruneheap.c           | 69 +++++++++++++++----
 src/backend/access/index/genam.c              |  4 +-
 src/backend/access/index/indexam.c            |  6 +-
 src/backend/access/nbtree/nbtsort.c           |  2 +-
 src/backend/access/table/tableam.c            |  8 ++-
 src/backend/commands/constraint.c             |  2 +-
 src/backend/commands/copyto.c                 |  2 +-
 src/backend/commands/tablecmds.c              |  4 +-
 src/backend/commands/typecmds.c               |  4 +-
 src/backend/executor/execIndexing.c           |  2 +-
 src/backend/executor/execMain.c               |  4 ++
 src/backend/executor/execReplication.c        |  8 +--
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  9 ++-
 src/backend/executor/nodeIndexonlyscan.c      |  2 +-
 src/backend/executor/nodeIndexscan.c          | 11 ++-
 src/backend/executor/nodeSeqscan.c            | 26 ++++++-
 src/backend/partitioning/partbounds.c         |  2 +-
 src/backend/utils/adt/selfuncs.c              |  2 +-
 src/include/access/genam.h                    |  3 +-
 src/include/access/heapam.h                   | 30 +++++++-
 src/include/access/tableam.h                  | 19 ++---
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  3 +-
 29 files changed, 210 insertions(+), 65 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index cb3331921cb..b9613787b85 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index c2b879b2bf6..147844690a1 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2bff37e03b5..ae53e311ce1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..27e3498f5f4 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,14 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0(sizeof(IndexFetchHeapData));
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
@@ -99,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -753,7 +762,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +771,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
@@ -2471,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 90270081acd..124722f1778 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -203,7 +203,9 @@ static bool heap_page_will_set_vis(Relation relation,
 								   Buffer heap_buf,
 								   Buffer vmbuffer,
 								   bool blk_known_av,
-								   const PruneState *prstate,
+								   PruneReason reason,
+								   bool do_prune, bool do_freeze,
+								   PruneState *prstate,
 								   uint8 *vmflags,
 								   bool *do_set_pd_vis);
 
@@ -218,9 +220,13 @@ static bool heap_page_will_set_vis(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -297,6 +303,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 				.vistest = vistest,.cutoffs = NULL
 			};
 
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
 
@@ -785,6 +798,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * have examined this page’s VM bits (e.g., VACUUM in the previous
  * heap_vac_scan_next_block() call) and can pass that along.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with the desired
  * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
  * should be set on the heap page.
@@ -795,7 +811,9 @@ heap_page_will_set_vis(Relation relation,
 					   Buffer heap_buf,
 					   Buffer vmbuffer,
 					   bool blk_known_av,
-					   const PruneState *prstate,
+					   PruneReason reason,
+					   bool do_prune, bool do_freeze,
+					   PruneState *prstate,
 					   uint8 *vmflags,
 					   bool *do_set_pd_vis)
 {
@@ -811,6 +829,23 @@ heap_page_will_set_vis(Relation relation,
 		return false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+	{
+		prstate->all_visible = prstate->all_frozen = false;
+		return false;
+	}
+
 	if (prstate->all_visible && !PageIsAllVisible(heap_page))
 		*do_set_pd_vis = true;
 
@@ -834,6 +869,11 @@ heap_page_will_set_vis(Relation relation,
 	 * page-level bit is clear.  However, it's possible that in vacuum the bit
 	 * got cleared after heap_vac_scan_next_block() was called, so we must
 	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 *
+	 * This will never trigger for on-access pruning because it couldn't have
+	 * done a previous visibility map lookup and thus will always pass
+	 * blk_known_av as false. A future vacuum will have to take care of fixing
+	 * the corruption.
 	 */
 	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
 			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -994,6 +1034,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * Even if we don't prune anything, if we found a new value for the
+	 * pd_prune_xid field or the page was marked full, we will update the hint
+	 * bit.
+	 */
+	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+		PageIsFull(page);
+
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
@@ -1004,14 +1052,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
-	/*
-	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
-	 */
-	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
-		PageIsFull(page);
-
 	/*
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
@@ -1054,6 +1094,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	do_set_vm = heap_page_will_set_vis(params->relation,
 									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   params->reason, do_prune, do_freeze,
 									   &prstate, &new_vmbits, &do_set_pd_vis);
 
 	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -2340,7 +2381,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 
 /*
  * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
- * record.
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
  */
 static TransactionId
 get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
@@ -2410,8 +2451,8 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
  * all.
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 0cb27af1310..1e7992dbeb3 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
 	}
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 454adaee7dc..02ab0233e59 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 5e41404937e..558c4497993 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -50,6 +50,7 @@ char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
 
+
 /* ----------------------------------------------------------------------------
  * Slot functions.
  * ----------------------------------------------------------------------------
@@ -163,10 +164,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -217,7 +219,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..22b453dc617 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 23ebaa3f230..66c418059fe 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6345,7 +6345,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13730,7 +13730,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 5979580139f..35560ac60d9 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3154,7 +3154,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3235,7 +3235,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 401606f840a..4e39ac00f30 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..0630a5af79e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index def32774c90..473d236e551 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f464cca9507..87b04b1b88e 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f36929deec3..90f929ce741 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 8ba038c5ef4..d3b340ee2a7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3370,7 +3370,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 540aa9628d7..28434146eba 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..2f9e9ea6318 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -204,6 +204,7 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2b6a521e4ea..1e3df54628b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,24 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -417,7 +440,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..0042636463f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the rel */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1154,9 +1157,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..0c3b0d60168 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v21-0012-Set-pd_prune_xid-on-insert.patch (6.7K, 13-v21-0012-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 4e0febe03cd305e81cb73235d750901e9ef379f0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v21 12/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ae53e311ce1..f329f497480 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5ab46e8bf8f..dac640f5c9d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -462,6 +462,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -611,9 +617,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-11-20 17:19  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 3 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-11-20 17:19 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, Nov 19, 2025 at 6:13 PM Melanie Plageman
<[email protected]> wrote:
>
> Since it is passed into one of the helpers, I think I agree. Attached
> v21 has this change.

I've committed the first three patches. Attached v22 is the remaining
patches which set the VM in heap_page_prune_and_freeze() for vacuum
and then allow on-access pruning to also set the VM.

- Melanie


Attachments:

  [text/x-patch] v22-0001-Split-heap_page_prune_and_freeze-into-helpers.patch (25.4K, 2-v22-0001-Split-heap_page_prune_and_freeze-into-helpers.patch)
  download | inline diff:
From 363f0e4ac9ac7699a6d9c2a267a2ad60825407c8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 17 Nov 2025 15:11:27 -0500
Subject: [PATCH v22 1/9] Split heap_page_prune_and_freeze() into helpers

Refactor the setup and planning phases of pruning and freezing into
helpers. This streamlines heap_page_prune_and_freeze() and makes it more
clear when the examination of tuples ends and page modifications begin.
---
 src/backend/access/heap/pruneheap.c | 559 +++++++++++++++-------------
 1 file changed, 307 insertions(+), 252 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1850476dcd8..1460193b920 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -157,6 +157,14 @@ typedef struct
 } PruneState;
 
 /* Local functions */
+static void prune_freeze_setup(PruneFreezeParams *params,
+							   TransactionId new_relfrozen_xid,
+							   MultiXactId new_relmin_mxid,
+							   const PruneFreezeResult *presult,
+							   PruneState *prstate);
+static void prune_freeze_plan(Oid reloid, Buffer buffer,
+							  PruneState *prstate,
+							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 											   HeapTuple tup,
 											   Buffer buffer);
@@ -308,200 +316,22 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 }
 
 /*
- * Decide whether to proceed with freezing according to the freeze plans
- * prepared for the given heap buffer. If freezing is chosen, this function
- * performs several pre-freeze checks.
- *
- * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
- * determined before calling this function.
- *
- * prstate is both an input and output parameter.
- *
- * Returns true if we should apply the freeze plans and freeze tuples on the
- * page, and false otherwise.
+ * Helper for heap_page_prune_and_freeze() to initialize the PruneState using
+ * the provided parameters.
  */
-static bool
-heap_page_will_freeze(Relation relation, Buffer buffer,
-					  bool did_tuple_hint_fpi,
-					  bool do_prune,
-					  bool do_hint_prune,
-					  PruneState *prstate)
-{
-	bool		do_freeze = false;
-
-	/*
-	 * If the caller specified we should not attempt to freeze any tuples,
-	 * validate that everything is in the right state and return.
-	 */
-	if (!prstate->attempt_freeze)
-	{
-		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
-		return false;
-	}
-
-	if (prstate->pagefrz.freeze_required)
-	{
-		/*
-		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
-		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
-		 * advance relfrozenxid/relminmxid.
-		 */
-		do_freeze = true;
-	}
-	else
-	{
-		/*
-		 * Opportunistically freeze the page if we are generating an FPI
-		 * anyway and if doing so means that we can set the page all-frozen
-		 * afterwards (might not happen until VACUUM's final heap pass).
-		 *
-		 * XXX: Previously, we knew if pruning emitted an FPI by checking
-		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
-		 * prune records were combined, this heuristic couldn't be used
-		 * anymore.  The opportunistic freeze heuristic must be improved;
-		 * however, for now, try to approximate the old logic.
-		 */
-		if (prstate->all_frozen && prstate->nfrozen > 0)
-		{
-			Assert(prstate->all_visible);
-
-			/*
-			 * Freezing would make the page all-frozen.  Have already emitted
-			 * an FPI or will do so anyway?
-			 */
-			if (RelationNeedsWAL(relation))
-			{
-				if (did_tuple_hint_fpi)
-					do_freeze = true;
-				else if (do_prune)
-				{
-					if (XLogCheckBufferNeedsBackup(buffer))
-						do_freeze = true;
-				}
-				else if (do_hint_prune)
-				{
-					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-						do_freeze = true;
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
-
-		/*
-		 * Calculate what the snapshot conflict horizon should be for a record
-		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
-		 * for conflicts when the whole page is eligible to become all-frozen
-		 * in the VM once we're done with it. Otherwise, we generate a
-		 * conservative cutoff by stepping back from OldestXmin.
-		 */
-		if (prstate->all_frozen)
-			prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
-		else
-		{
-			/* Avoids false conflicts when hot_standby_feedback in use */
-			prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
-			TransactionIdRetreat(prstate->frz_conflict_horizon);
-		}
-	}
-	else if (prstate->nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate->pagefrz.freeze_required);
-
-		prstate->all_frozen = false;
-		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
-
-	return do_freeze;
-}
-
-
-/*
- * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
- *
- * Caller must have pin and buffer cleanup lock on the page.  Note that we
- * don't update the FSM information for page on caller's behalf.  Caller might
- * also need to account for a reduction in the length of the line pointer
- * array following array truncation by us.
- *
- * params contains the input parameters used to control freezing and pruning
- * behavior. See the definition of PruneFreezeParams for more on what each
- * parameter does.
- *
- * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
- * tuples if it's required in order to advance relfrozenxid / relminmxid, or
- * if it's considered advantageous for overall system performance to do so
- * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opporunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
- *
- * presult contains output parameters needed by callers, such as the number of
- * tuples removed and the offsets of dead items on the page after pruning.
- * heap_page_prune_and_freeze() is responsible for initializing it.  Required
- * by all callers.
- *
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
- * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PAGE_PRUNE_FREEZE option is set in params.  On entry, they contain the
- * oldest XID and multi-XID seen on the relation so far.  They will be updated
- * with oldest values present on the page after pruning.  After processing the
- * whole relation, VACUUM can use these values as the new
- * relfrozenxid/relminmxid for the relation.
- */
-void
-heap_page_prune_and_freeze(PruneFreezeParams *params,
-						   PruneFreezeResult *presult,
-						   OffsetNumber *off_loc,
-						   TransactionId *new_relfrozen_xid,
-						   MultiXactId *new_relmin_mxid)
+static void
+prune_freeze_setup(PruneFreezeParams *params,
+				   TransactionId new_relfrozen_xid,
+				   MultiXactId new_relmin_mxid,
+				   const PruneFreezeResult *presult,
+				   PruneState *prstate)
 {
-	Buffer		buffer = params->buffer;
-	Page		page = BufferGetPage(buffer);
-	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	OffsetNumber offnum,
-				maxoff;
-	PruneState	prstate;
-	HeapTupleData tup;
-	bool		do_freeze;
-	bool		do_prune;
-	bool		do_hint_prune;
-	bool		did_tuple_hint_fpi;
-	int64		fpi_before = pgWalUsage.wal_fpi;
-
 	/* Copy parameters to prstate */
-	prstate.vistest = params->vistest;
-	prstate.mark_unused_now =
+	prstate->vistest = params->vistest;
+	prstate->mark_unused_now =
 		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
-	prstate.cutoffs = params->cutoffs;
+	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->cutoffs = params->cutoffs;
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -514,41 +344,42 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * prunable, we will save the lowest relevant XID in new_prune_xid. Also
 	 * initialize the rest of our working state.
 	 */
-	prstate.new_prune_xid = InvalidTransactionId;
-	prstate.latest_xid_removed = InvalidTransactionId;
-	prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
-	prstate.nroot_items = 0;
-	prstate.nheaponly_items = 0;
+	prstate->new_prune_xid = InvalidTransactionId;
+	prstate->latest_xid_removed = InvalidTransactionId;
+	prstate->nredirected = prstate->ndead = prstate->nunused = 0;
+	prstate->nfrozen = 0;
+	prstate->nroot_items = 0;
+	prstate->nheaponly_items = 0;
 
 	/* initialize page freezing working state */
-	prstate.pagefrz.freeze_required = false;
-	if (prstate.attempt_freeze)
+	prstate->pagefrz.freeze_required = false;
+	if (prstate->attempt_freeze)
 	{
-		Assert(new_relfrozen_xid && new_relmin_mxid);
-		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
-		prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
-		prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
-		prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+		prstate->pagefrz.FreezePageRelfrozenXid = new_relfrozen_xid;
+		prstate->pagefrz.NoFreezePageRelfrozenXid = new_relfrozen_xid;
+		prstate->pagefrz.FreezePageRelminMxid = new_relmin_mxid;
+		prstate->pagefrz.NoFreezePageRelminMxid = new_relmin_mxid;
 	}
 	else
 	{
-		Assert(new_relfrozen_xid == NULL && new_relmin_mxid == NULL);
-		prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
-		prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
-		prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
-		prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+		Assert(new_relfrozen_xid == InvalidTransactionId &&
+			   new_relmin_mxid == InvalidMultiXactId);
+		prstate->pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+		prstate->pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+		prstate->pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+		prstate->pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
 	}
 
-	prstate.ndeleted = 0;
-	prstate.live_tuples = 0;
-	prstate.recently_dead_tuples = 0;
-	prstate.hastup = false;
-	prstate.lpdead_items = 0;
-	prstate.deadoffsets = presult->deadoffsets;
-	prstate.frz_conflict_horizon = InvalidTransactionId;
+	prstate->ndeleted = 0;
+	prstate->live_tuples = 0;
+	prstate->recently_dead_tuples = 0;
+	prstate->hastup = false;
+	prstate->lpdead_items = 0;
+	prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
+	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
+	 * Vacuum may update the VM after we're done.  We can keep track of
 	 * whether the page will be all-visible and all-frozen after pruning and
 	 * freezing to help the caller to do that.
 	 *
@@ -571,10 +402,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * all_frozen before we return them to the caller, so that the caller
 	 * doesn't set the VM bits incorrectly.
 	 */
-	if (prstate.attempt_freeze)
+	if (prstate->attempt_freeze)
 	{
-		prstate.all_visible = true;
-		prstate.all_frozen = true;
+		prstate->all_visible = true;
+		prstate->all_frozen = true;
 	}
 	else
 	{
@@ -582,8 +413,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * Initializing to false allows skipping the work to update them in
 		 * heap_prune_record_unchanged_lp_normal().
 		 */
-		prstate.all_visible = false;
-		prstate.all_frozen = false;
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
 	}
 
 	/*
@@ -594,10 +425,29 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * running transaction on the standby does not see tuples on the page as
 	 * all-visible, so the conflict horizon remains InvalidTransactionId.
 	 */
-	prstate.visibility_cutoff_xid = InvalidTransactionId;
+	prstate->visibility_cutoff_xid = InvalidTransactionId;
+}
 
-	maxoff = PageGetMaxOffsetNumber(page);
-	tup.t_tableOid = RelationGetRelid(params->relation);
+/*
+ * Helper for heap_page_prune_and_freeze(). Iterates over every tuple on the
+ * page, examines its visibility information, and determines the appropriate
+ * action for each tuple. All tuples are processed and classified during this
+ * phase, but no modifications are made to the page until the later execution
+ * stage.
+ *
+ * *off_loc is used for error callback and cleared before returning.
+ */
+static void
+prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
+				  OffsetNumber *off_loc)
+{
+	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+	OffsetNumber offnum;
+	HeapTupleData tup;
+
+	tup.t_tableOid = reloid;
 
 	/*
 	 * Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -632,13 +482,13 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		*off_loc = offnum;
 
-		prstate.processed[offnum] = false;
-		prstate.htsv[offnum] = -1;
+		prstate->processed[offnum] = false;
+		prstate->htsv[offnum] = -1;
 
 		/* Nothing to do if slot doesn't contain a tuple */
 		if (!ItemIdIsUsed(itemid))
 		{
-			heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
+			heap_prune_record_unchanged_lp_unused(page, prstate, offnum);
 			continue;
 		}
 
@@ -648,17 +498,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * If the caller set mark_unused_now true, we can set dead line
 			 * pointers LP_UNUSED now.
 			 */
-			if (unlikely(prstate.mark_unused_now))
-				heap_prune_record_unused(&prstate, offnum, false);
+			if (unlikely(prstate->mark_unused_now))
+				heap_prune_record_unused(prstate, offnum, false);
 			else
-				heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
+				heap_prune_record_unchanged_lp_dead(page, prstate, offnum);
 			continue;
 		}
 
 		if (ItemIdIsRedirected(itemid))
 		{
 			/* This is the start of a HOT chain */
-			prstate.root_items[prstate.nroot_items++] = offnum;
+			prstate->root_items[prstate->nroot_items++] = offnum;
 			continue;
 		}
 
@@ -672,21 +522,15 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		tup.t_len = ItemIdGetLength(itemid);
 		ItemPointerSet(&tup.t_self, blockno, offnum);
 
-		prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
-														   buffer);
+		prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, &tup,
+															buffer);
 
 		if (!HeapTupleHeaderIsHeapOnly(htup))
-			prstate.root_items[prstate.nroot_items++] = offnum;
+			prstate->root_items[prstate->nroot_items++] = offnum;
 		else
-			prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
+			prstate->heaponly_items[prstate->nheaponly_items++] = offnum;
 	}
 
-	/*
-	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
-	 * an FPI to be emitted.
-	 */
-	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
-
 	/*
 	 * Process HOT chains.
 	 *
@@ -698,30 +542,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * the page instead of using the root_items array, also did it in
 	 * ascending offset number order.)
 	 */
-	for (int i = prstate.nroot_items - 1; i >= 0; i--)
+	for (int i = prstate->nroot_items - 1; i >= 0; i--)
 	{
-		offnum = prstate.root_items[i];
+		offnum = prstate->root_items[i];
 
 		/* Ignore items already processed as part of an earlier chain */
-		if (prstate.processed[offnum])
+		if (prstate->processed[offnum])
 			continue;
 
 		/* see preceding loop */
 		*off_loc = offnum;
 
 		/* Process this item or chain of items */
-		heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
+		heap_prune_chain(page, blockno, maxoff, offnum, prstate);
 	}
 
 	/*
 	 * Process any heap-only tuples that were not already processed as part of
 	 * a HOT chain.
 	 */
-	for (int i = prstate.nheaponly_items - 1; i >= 0; i--)
+	for (int i = prstate->nheaponly_items - 1; i >= 0; i--)
 	{
-		offnum = prstate.heaponly_items[i];
+		offnum = prstate->heaponly_items[i];
 
-		if (prstate.processed[offnum])
+		if (prstate->processed[offnum])
 			continue;
 
 		/* see preceding loop */
@@ -740,7 +584,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * return true for an XMIN_INVALID tuple, so this code will work even
 		 * when there were sequential updates within the aborted transaction.)
 		 */
-		if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
+		if (prstate->htsv[offnum] == HEAPTUPLE_DEAD)
 		{
 			ItemId		itemid = PageGetItemId(page, offnum);
 			HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -748,8 +592,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
 			{
 				HeapTupleHeaderAdvanceConflictHorizon(htup,
-													  &prstate.latest_xid_removed);
-				heap_prune_record_unused(&prstate, offnum, true);
+													  &prstate->latest_xid_removed);
+				heap_prune_record_unused(prstate, offnum, true);
 			}
 			else
 			{
@@ -766,7 +610,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			}
 		}
 		else
-			heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
+			heap_prune_record_unchanged_lp_normal(page, prstate, offnum);
 	}
 
 	/* We should now have processed every tuple exactly once  */
@@ -777,12 +621,223 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	{
 		*off_loc = offnum;
 
-		Assert(prstate.processed[offnum]);
+		Assert(prstate->processed[offnum]);
 	}
 #endif
 
 	/* Clear the offset information once we have processed the given page. */
 	*off_loc = InvalidOffsetNumber;
+}
+
+/*
+ * Decide whether to proceed with freezing according to the freeze plans
+ * prepared for the given heap buffer. If freezing is chosen, this function
+ * performs several pre-freeze checks.
+ *
+ * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
+ * determined before calling this function.
+ *
+ * prstate is both an input and output parameter.
+ *
+ * Returns true if we should apply the freeze plans and freeze tuples on the
+ * page, and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool did_tuple_hint_fpi,
+					  bool do_prune,
+					  bool do_hint_prune,
+					  PruneState *prstate)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and return.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			Assert(prstate->all_visible);
+
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_prune)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+		/*
+		 * Calculate what the snapshot conflict horizon should be for a record
+		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+		 * for conflicts when the whole page is eligible to become all-frozen
+		 * in the VM once we're done with it. Otherwise, we generate a
+		 * conservative cutoff by stepping back from OldestXmin.
+		 */
+		if (prstate->all_frozen)
+			prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
+		else
+		{
+			/* Avoids false conflicts when hot_standby_feedback in use */
+			prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+			TransactionIdRetreat(prstate->frz_conflict_horizon);
+		}
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	return do_freeze;
+}
+
+
+/*
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
+ *
+ * Caller must have pin and buffer cleanup lock on the page.  Note that we
+ * don't update the FSM information for page on caller's behalf.  Caller might
+ * also need to account for a reduction in the length of the line pointer
+ * array following array truncation by us.
+ *
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
+ *
+ * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
+ * tuples if it's required in order to advance relfrozenxid / relminmxid, or
+ * if it's considered advantageous for overall system performance to do so
+ * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
+ * 'new_relmin_mxid' arguments are required when freezing.  When
+ * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
+ * and presult->all_frozen after determining whether or not to
+ * opporunistically freeze, to indicate if the VM bits can be set.  They are
+ * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
+ * because at the moment only callers that also freeze need that information.
+ *
+ * presult contains output parameters needed by callers, such as the number of
+ * tuples removed and the offsets of dead items on the page after pruning.
+ * heap_page_prune_and_freeze() is responsible for initializing it.  Required
+ * by all callers.
+ *
+ * off_loc is the offset location required by the caller to use in error
+ * callback.
+ *
+ * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
+ * HEAP_PAGE_PRUNE_FREEZE option is set in params.  On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far.  They will be updated
+ * with oldest values present on the page after pruning.  After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
+ */
+void
+heap_page_prune_and_freeze(PruneFreezeParams *params,
+						   PruneFreezeResult *presult,
+						   OffsetNumber *off_loc,
+						   TransactionId *new_relfrozen_xid,
+						   MultiXactId *new_relmin_mxid)
+{
+	Buffer		buffer = params->buffer;
+	Page		page = BufferGetPage(buffer);
+	PruneState	prstate;
+	bool		do_freeze;
+	bool		do_prune;
+	bool		do_hint_prune;
+	bool		did_tuple_hint_fpi;
+	int64		fpi_before = pgWalUsage.wal_fpi;
+
+	/* Initialize prstate */
+	prune_freeze_setup(params,
+					   new_relfrozen_xid ?
+					   *new_relfrozen_xid : InvalidTransactionId,
+					   new_relmin_mxid ?
+					   *new_relmin_mxid : InvalidMultiXactId,
+					   presult,
+					   &prstate);
+
+	/*
+	 * Examine all line pointers and tuple visibility information to determine
+	 * which line pointers should change state and which tuples may be frozen.
+	 * Prepare queue of state changes to later be executed in a critical
+	 * section.
+	 */
+	prune_freeze_plan(RelationGetRelid(params->relation),
+					  buffer, &prstate, off_loc);
+
+	/*
+	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
+	 * checking tuple visibility information in prune_freeze_plan() may have
+	 * caused an FPI to be emitted.
+	 */
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	do_prune = prstate.nredirected > 0 ||
 		prstate.ndead > 0 ||
-- 
2.43.0



  [text/x-patch] v22-0002-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (43.4K, 3-v22-0002-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 8ebaf434af5afaebcf71550116c59355b3bf15c1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 8 Oct 2025 15:39:01 -0400
Subject: [PATCH v22 2/9] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c |  37 ++-
 src/backend/access/heap/pruneheap.c   | 462 +++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c  | 241 +-------------
 src/include/access/heapam.h           |  43 ++-
 4 files changed, 447 insertions(+), 336 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11cb3f74da5..2af724451c3 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -104,6 +104,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		OffsetNumber *frz_offsets;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 		bool		do_prune;
+		bool		set_lsn = false;
+		bool		mark_buffer_dirty = false;
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
 											   &nplans, &plans, &frz_offsets,
@@ -157,17 +159,36 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
-		if (vmflags & VISIBILITYMAP_VALID_BITS)
-			PageSetAllVisible(page);
-
-		MarkBufferDirty(buffer);
+		if (do_prune || nplans > 0)
+			mark_buffer_dirty = set_lsn = true;
 
 		/*
-		 * See log_heap_prune_and_freeze() for commentary on when we set the
-		 * heap page LSN.
+		 * The critical integrity requirement here is that we must never end
+		 * up with with the visibility map bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear.  If that were to occur, a subsequent page
+		 * modification would fail to clear the visibility map bit.
+		 *
+		 * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
+		 * marking an all-visible page all-frozen). If only the VM is updated,
+		 * the heap page need not be dirtied.
 		 */
-		if (do_prune || nplans > 0 ||
-			((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			mark_buffer_dirty = true;
+
+			/*
+			 * See log_heap_prune_and_freeze() for commentary on when we set
+			 * the heap page LSN.
+			 */
+			if (XLogHintBitIsNeeded())
+				set_lsn = true;
+		}
+
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+
+		if (set_lsn)
 			PageSetLSN(page, lsn);
 
 		/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1460193b920..ba578c1ce0f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -188,10 +191,21 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid, bool blk_already_av,
+									  bool set_blk_all_frozen);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-
+static bool heap_page_will_set_vis(Relation relation,
+								   BlockNumber heap_blk,
+								   Buffer heap_buf,
+								   Buffer vmbuffer,
+								   bool blk_known_av,
+								   const PruneState *prstate,
+								   uint8 *vmflags,
+								   bool *do_set_pd_vis);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -278,6 +292,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * current implementation.
 			 */
 			PruneFreezeParams params = {.relation = relation,.buffer = buffer,
+				.vmbuffer = InvalidBuffer,.blk_known_av = false,
 				.reason = PRUNE_ON_ACCESS,.options = 0,
 				.vistest = vistest,.cutoffs = NULL
 			};
@@ -331,6 +346,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->mark_unused_now =
 		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
 	prstate->cutoffs = params->cutoffs;
 
 	/*
@@ -379,51 +396,54 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
+	 *
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
-	 * that after scanning the line pointers. We must correct all_visible and
-	 * all_frozen before we return them to the caller, so that the caller
-	 * doesn't set the VM bits incorrectly.
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+	 * them after deciding whether to freeze, but before updating the VM, to
+	 * avoid setting the VM bits incorrectly.
+	 *
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate->attempt_freeze)
 	{
 		prstate->all_visible = true;
 		prstate->all_frozen = true;
 	}
+	else if (prstate->attempt_update_vm)
+	{
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate->all_visible = false;
 		prstate->all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -758,10 +778,133 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneState and blk_known_av. Some callers may already
+ * have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
+ * should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+					   BlockNumber heap_blk,
+					   Buffer heap_buf,
+					   Buffer vmbuffer,
+					   bool blk_known_av,
+					   const PruneState *prstate,
+					   uint8 *vmflags,
+					   bool *do_set_pd_vis)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+	bool		do_set_vm = false;
+
+	*do_set_pd_vis = false;
+
+	if (!prstate->attempt_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	if (prstate->all_visible && !PageIsAllVisible(heap_page))
+		*do_set_pd_vis = true;
+
+	if ((prstate->all_visible && !blk_known_av) ||
+		(prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+		do_set_vm = true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that in vacuum the bit
+	 * got cleared after heap_vac_scan_next_block() was called, so we must
+	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return do_set_vm;
+}
+
+
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+#endif
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -776,12 +919,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opporunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -806,13 +950,20 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
+	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -878,6 +1029,34 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+	 * based on information from the VM and the all_visible/all_frozen flags.
+	 *
+	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+	 * VM bit is clear, we strongly prefer to keep them in sync.
+	 *
+	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+	 * already been set. Setting only the VM is most common when setting an
+	 * already all-visible page all-frozen.
+	 */
+	do_set_vm = heap_page_will_set_vis(params->relation,
+									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   &prstate, &new_vmbits, &do_set_pd_vis);
+
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									prstate.latest_xid_removed, prstate.frz_conflict_horizon,
+									prstate.visibility_cutoff_xid, params->blk_known_av,
+									(do_set_vm && (new_vmbits & VISIBILITYMAP_ALL_FROZEN)));
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -899,14 +1078,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_pd_vis)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -920,36 +1102,43 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (do_set_pd_vis)
+			PageSetAllVisible(page);
 
-		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
-		 */
-		if (RelationNeedsWAL(params->relation))
+		if (do_prune || do_freeze || do_set_pd_vis)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
+			Assert(PageIsAllVisible(page));
 
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
+			old_vmbits = visibilitymap_set_vmbits(blockno,
+												  vmbuffer, new_vmbits,
+												  params->relation->rd_locator);
+			if (old_vmbits == new_vmbits)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				/* Unset so we don't emit WAL since no change occurred */
+				do_set_vm = false;
+			}
+		}
 
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+		 * only updating the VM and it turns out it was already set, we will
+		 * have unset do_set_vm earlier. As such, check it again before
+		 * emitting the record.
+		 */
+		if (RelationNeedsWAL(params->relation) &&
+			(do_prune || do_freeze || do_set_vm))
+		{
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  do_set_pd_vis,
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -959,28 +1148,47 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
+
+		if (!heap_page_is_all_visible(params->relation, buffer,
+									  prstate.cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1472,6 +1680,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			{
 				TransactionId xmin;
 
+				Assert(prstate->attempt_update_vm);
+
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = false;
@@ -2122,6 +2332,65 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 	return nplans;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid, bool blk_already_av,
+				 bool set_blk_all_frozen)
+{
+	TransactionId conflict_xid;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune && !do_freeze &&
+		do_set_vm && blk_already_av && set_blk_all_frozen)
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	return conflict_xid;
+}
+
 /*
  * Write an XLOG_HEAP2_PRUNE* WAL record
  *
@@ -2146,6 +2415,15 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
+ * In some cases, such as when heap_page_prune_and_freeze() is setting an
+ * already marked all-visible page all-frozen, PD_ALL_VISIBLE may already be
+ * set. So, it is possible for vmflags to be non-zero and set_pd_all_vis to be
+ * false.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
@@ -2155,6 +2433,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 						  Buffer vmbuffer, uint8 vmflags,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
+						  bool set_pd_all_vis,
 						  PruneReason reason,
 						  HeapTupleFreeze *frozen, int nfrozen,
 						  OffsetNumber *redirected, int nredirected,
@@ -2191,7 +2470,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 */
 	if (!do_prune &&
 		nfrozen == 0 &&
-		(!do_set_vm || !XLogHintBitIsNeeded()))
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags_heap |= REGBUF_NO_IMAGE;
 
 	/*
@@ -2309,7 +2588,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * See comment at the top of the function about regbuf_flags_heap for
 	 * details on when we can advance the page LSN.
 	 */
-	if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
 	{
 		Assert(BufferIsDirty(buffer));
 		PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7a6d6f42634..ef73eafb4f6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,20 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
-										   OffsetNumber *deadoffsets,
-										   int ndeadoffsets,
-										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
-										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -1966,7 +1952,9 @@ lazy_scan_prune(LVRelState *vacrel,
 	Relation	rel = vacrel->rel;
 	PruneFreezeResult presult;
 	PruneFreezeParams params = {.relation = rel,.buffer = buf,
-		.reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE,
+		.vmbuffer = vmbuffer,.blk_known_av = all_visible_according_to_vm,
+		.reason = PRUNE_VACUUM_SCAN,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS,
 		.vistest = vacrel->vistest,.cutoffs = &vacrel->cutoffs
 	};
 
@@ -2009,33 +1997,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2069,168 +2030,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
 	return presult.ndeleted;
@@ -2952,6 +2771,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  vmflags,
 								  conflict_xid,
 								  false,	/* no cleanup lock required */
+								  (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
@@ -3632,30 +3452,6 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
-{
-
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
-										  NULL, 0,
-										  all_frozen,
-										  visibility_cutoff_xid,
-										  logging_offnum);
-}
-#endif
-
 /*
  * Check whether the heap page in buf is all-visible except for the dead
  * tuples referenced in the deadoffsets array.
@@ -3678,15 +3474,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
  * This logic is closely related to heap_prune_record_unchanged_lp_normal().
  * If you modify this function, ensure consistency with that code. An
  * assertion cross-checks that both remain in agreement. Do not introduce new
  * side-effects.
  */
-static bool
+bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..937b46a77db 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 *
+	 * vmbuffer is the buffer that must already contain contain the required
+	 * block of the visibility map if we are to update it. blk_known_av is the
+	 * visibility status of the heap block as of the last call to
+	 * find_next_unskippable_block().
+	 */
+	Buffer		vmbuffer;
+	bool		blk_known_av;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -250,8 +261,10 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
+	 *
+	 * HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status
+	 * in the VM.
 	 */
 	int			options;
 
@@ -285,19 +298,15 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+	 * we have attempted to update the VM.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
+	uint8		new_vmbits;
+	uint8		old_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -424,6 +433,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 									  Buffer vmbuffer, uint8 vmflags,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
+									  bool set_pd_all_vis,
 									  PruneReason reason,
 									  HeapTupleFreeze *frozen, int nfrozen,
 									  OffsetNumber *redirected, int nredirected,
@@ -433,6 +443,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v22-0003-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.5K, 4-v22-0003-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 34f0009570e117d7d48b560cd097ee25c6cdcc7c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v22 3/9] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ef73eafb4f6..6a87fc371a0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1872,9 +1872,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1891,13 +1894,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  true, /* set_pd_all_vis */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v22-0004-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (26.2K, 5-v22-0004-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 0d6a06d4533cfe153440d301c3d20915ba07892f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v22 4/9] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |  18 ++-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 112 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  28 +---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 54 insertions(+), 379 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4b0c49f4bb0..2bff37e03b5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8797,50 +8797,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2af724451c3..5ab46e8bf8f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -251,7 +251,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -264,142 +264,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -777,8 +641,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -790,11 +654,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1375,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ba578c1ce0f..80037d690e3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1112,9 +1112,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		{
 			Assert(PageIsAllVisible(page));
 
-			old_vmbits = visibilitymap_set_vmbits(blockno,
-												  vmbuffer, new_vmbits,
-												  params->relation->rd_locator);
+			old_vmbits = visibilitymap_set(blockno,
+										   vmbuffer, new_vmbits,
+										   params->relation->rd_locator);
 			if (old_vmbits == new_vmbits)
 			{
 				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
@@ -2396,14 +2396,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ *   all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6a87fc371a0..5beb410aacc 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1894,11 +1894,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2776,9 +2776,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 * set PD_ALL_VISIBLE.
 		 */
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..e9e77bd678b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c751c25a04d..2a9951b7188 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4292,7 +4292,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v22-0005-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (8.0K, 6-v22-0005-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From fd0455230968fd919999a5c035f3830d310f0e49 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v22 5/9] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.

The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 12 ++++++------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 17 ++++++++---------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 80037d690e3..989af765702 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -250,7 +250,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -476,7 +476,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
+	 * checked item causes GlobalVisFullXidVisibleToAll() to update the
 	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
 	 * transaction aborts.
 	 *
@@ -1238,11 +1238,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1701,7 +1701,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				/*
 				 * For now always use prstate->cutoffs for this test, because
 				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisTestIsRemovableXid
+				 * is requested. We could use GlobalVisXidVisibleToAll()
 				 * instead, if a non-freezing caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 71ef2e5036f..1c0eb425ee9 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 }
 
 /*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
  *
  * It is crucial that this only gets called for xids from a source that
  * protects against xid wraparounds (e.g. from a table and thus protected by
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
  */
 bool
 GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v22-0006-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (10.5K, 7-v22-0006-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 565014e31aa117fb43993ee2e64da38ffb74f372 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v22 6/9] Use GlobalVisState in vacuum to determine page level
 visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++
 src/backend/access/heap/pruneheap.c         | 43 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 10 ++---
 src/include/access/heapam.h                 | 11 +++---
 4 files changed, 58 insertions(+), 34 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 989af765702..040efe80f2e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -439,11 +439,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -886,14 +887,13 @@ heap_page_will_set_vis(Relation relation,
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -994,6 +994,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update the hint
@@ -1165,10 +1175,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		if (!heap_page_is_all_visible(params->relation, buffer,
-									  prstate.cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1698,20 +1707,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisXidVisibleToAll()
-				 * instead, if a non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5beb410aacc..7c3bb25cc04 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2730,7 +2730,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3489,7 +3489,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3505,7 +3505,7 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3579,7 +3579,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 		/* Visibility checks may do IO or allocate memory */
 		Assert(CritSectionCount == 0);
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3598,7 +3598,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 937b46a77db..2b6a521e4ea 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -276,10 +276,9 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Contains the cutoffs used for freezing. They are required if the
-	 * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
-	 * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
-	 * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
-	 * calculates them once, at the beginning of vacuuming the relation.
+	 * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+	 * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+	 * relation.
 	 */
 	struct VacuumCutoffs *cutoffs;
 } PruneFreezeParams;
@@ -444,7 +443,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -458,6 +457,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v22-0007-Unset-all_visible-sooner-if-not-freezing.patch (2.4K, 8-v22-0007-Unset-all_visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 44ba53840d52ca255ddb09acb6fd0cda8559a4db Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v22 7/9] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
---
 src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 040efe80f2e..90270081acd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1564,8 +1564,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	/*
 	 * Deliberately delay unsetting all_visible and all_frozen until later
 	 * during pruning. Removable dead tuples shouldn't preclude freezing the
-	 * page.
+	 * page. If we won't attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1824,8 +1829,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible and all_frozen until later, at the
 	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
 	 * freeze the page after pruning.  As long as we unset it before updating
-	 * the visibility map, this will be correct.
+	 * the visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v22-0008-Allow-on-access-pruning-to-set-pages-all-visible.patch (37.7K, 9-v22-0008-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From bced81f6df3d303679fac2a1414d42f0db401232 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v22 8/9] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c               |  2 +-
 src/backend/access/brin/brin.c                |  3 +-
 src/backend/access/gin/gininsert.c            |  3 +-
 src/backend/access/heap/heapam.c              | 15 +++-
 src/backend/access/heap/heapam_handler.c      | 22 ++++--
 src/backend/access/heap/pruneheap.c           | 69 +++++++++++++++----
 src/backend/access/index/genam.c              |  4 +-
 src/backend/access/index/indexam.c            |  6 +-
 src/backend/access/nbtree/nbtsort.c           |  2 +-
 src/backend/access/table/tableam.c            |  8 ++-
 src/backend/commands/constraint.c             |  2 +-
 src/backend/commands/copyto.c                 |  2 +-
 src/backend/commands/tablecmds.c              |  4 +-
 src/backend/commands/typecmds.c               |  4 +-
 src/backend/executor/execIndexing.c           |  2 +-
 src/backend/executor/execMain.c               |  4 ++
 src/backend/executor/execReplication.c        |  8 +--
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  9 ++-
 src/backend/executor/nodeIndexonlyscan.c      |  2 +-
 src/backend/executor/nodeIndexscan.c          | 11 ++-
 src/backend/executor/nodeSeqscan.c            | 26 ++++++-
 src/backend/partitioning/partbounds.c         |  2 +-
 src/backend/utils/adt/selfuncs.c              |  2 +-
 src/include/access/genam.h                    |  3 +-
 src/include/access/heapam.h                   | 30 +++++++-
 src/include/access/tableam.h                  | 19 ++---
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  3 +-
 29 files changed, 210 insertions(+), 65 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index cb3331921cb..b9613787b85 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index c2b879b2bf6..147844690a1 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2bff37e03b5..ae53e311ce1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..27e3498f5f4 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,14 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0(sizeof(IndexFetchHeapData));
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
@@ -99,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -753,7 +762,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +771,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
@@ -2471,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 90270081acd..124722f1778 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -203,7 +203,9 @@ static bool heap_page_will_set_vis(Relation relation,
 								   Buffer heap_buf,
 								   Buffer vmbuffer,
 								   bool blk_known_av,
-								   const PruneState *prstate,
+								   PruneReason reason,
+								   bool do_prune, bool do_freeze,
+								   PruneState *prstate,
 								   uint8 *vmflags,
 								   bool *do_set_pd_vis);
 
@@ -218,9 +220,13 @@ static bool heap_page_will_set_vis(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -297,6 +303,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 				.vistest = vistest,.cutoffs = NULL
 			};
 
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
 
@@ -785,6 +798,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * have examined this page’s VM bits (e.g., VACUUM in the previous
  * heap_vac_scan_next_block() call) and can pass that along.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with the desired
  * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
  * should be set on the heap page.
@@ -795,7 +811,9 @@ heap_page_will_set_vis(Relation relation,
 					   Buffer heap_buf,
 					   Buffer vmbuffer,
 					   bool blk_known_av,
-					   const PruneState *prstate,
+					   PruneReason reason,
+					   bool do_prune, bool do_freeze,
+					   PruneState *prstate,
 					   uint8 *vmflags,
 					   bool *do_set_pd_vis)
 {
@@ -811,6 +829,23 @@ heap_page_will_set_vis(Relation relation,
 		return false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+	{
+		prstate->all_visible = prstate->all_frozen = false;
+		return false;
+	}
+
 	if (prstate->all_visible && !PageIsAllVisible(heap_page))
 		*do_set_pd_vis = true;
 
@@ -834,6 +869,11 @@ heap_page_will_set_vis(Relation relation,
 	 * page-level bit is clear.  However, it's possible that in vacuum the bit
 	 * got cleared after heap_vac_scan_next_block() was called, so we must
 	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 *
+	 * This will never trigger for on-access pruning because it couldn't have
+	 * done a previous visibility map lookup and thus will always pass
+	 * blk_known_av as false. A future vacuum will have to take care of fixing
+	 * the corruption.
 	 */
 	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
 			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -994,6 +1034,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * Even if we don't prune anything, if we found a new value for the
+	 * pd_prune_xid field or the page was marked full, we will update the hint
+	 * bit.
+	 */
+	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+		PageIsFull(page);
+
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
@@ -1004,14 +1052,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
-	/*
-	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
-	 */
-	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
-		PageIsFull(page);
-
 	/*
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
@@ -1054,6 +1094,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	do_set_vm = heap_page_will_set_vis(params->relation,
 									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   params->reason, do_prune, do_freeze,
 									   &prstate, &new_vmbits, &do_set_pd_vis);
 
 	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -2340,7 +2381,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 
 /*
  * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
- * record.
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
  */
 static TransactionId
 get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
@@ -2410,8 +2451,8 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
  * all.
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 0cb27af1310..1e7992dbeb3 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
 	}
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 454adaee7dc..02ab0233e59 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 5e41404937e..558c4497993 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -50,6 +50,7 @@ char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
 
+
 /* ----------------------------------------------------------------------------
  * Slot functions.
  * ----------------------------------------------------------------------------
@@ -163,10 +164,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -217,7 +219,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..22b453dc617 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 23ebaa3f230..66c418059fe 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6345,7 +6345,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13730,7 +13730,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 5979580139f..35560ac60d9 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3154,7 +3154,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3235,7 +3235,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 401606f840a..4e39ac00f30 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..0630a5af79e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index def32774c90..473d236e551 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f464cca9507..87b04b1b88e 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f36929deec3..90f929ce741 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 8ba038c5ef4..d3b340ee2a7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3370,7 +3370,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 540aa9628d7..28434146eba 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..2f9e9ea6318 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -204,6 +204,7 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2b6a521e4ea..1e3df54628b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,24 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -417,7 +440,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..0042636463f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the rel */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1154,9 +1157,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..0c3b0d60168 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v22-0009-Set-pd_prune_xid-on-insert.patch (6.7K, 10-v22-0009-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 76cb5109137fc1cceb62b4e5091115eee23fc6e9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v22 9/9] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ae53e311ce1..f329f497480 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5ab46e8bf8f..dac640f5c9d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -462,6 +462,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -611,9 +617,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-11-20 17:55  Dagfinn Ilmari Mannsåker <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 2 replies; 143+ messages in thread

From: Dagfinn Ilmari Mannsåker @ 2025-11-20 17:55 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Melanie Plageman <[email protected]> writes:

> +			PruneFreezeParams params = {.relation = relation,.buffer = buffer,
> +				.reason = PRUNE_ON_ACCESS,.options = 0,
> +				.vistest = vistest,.cutoffs = NULL
> +			};

I didn't pay much attention to this thread, so I didn't notice this
until it got committed, but I'd like to lodge an objection to this
formatting, especially the lack of spaces before the field names. This
would be much more readable with one struct field per line, i.e.

	PruneFreezeParams params = {
		.relation = rel,
                .buffer = buf,
		.reason = PRUNE_VACUUM_SCAN,
		.options = HEAP_PAGE_PRUNE_FREEZE,
		.vistest = vacrel->vistest,
		.cutoffs = &vacrel->cutoffs,
	};

or at a pinch, if we're really being stingy with the vertical space:

	PruneFreezeParams params = {
		.relation = rel, .buffer = buf,
                .reason = PRUNE_VACUUM_SCAN, .options = HEAP_PAGE_PRUNE_FREEZE,
		.vistest = vacrel->vistest, .cutoffs = &vacrel->cutoffs,
	};

I had a quick grep, and every other designated struct initialiser I
could find uses the one-field-per-line form, but they're not consistent
about the comma after the last field.  I personally prefer having it, so
that one can add more fields later without having to modify the
unrelated line.

- ilmari





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-11-20 18:02  Dagfinn Ilmari Mannsåker <[email protected]>
  parent: Dagfinn Ilmari Mannsåker <[email protected]>
  1 sibling, 0 replies; 143+ messages in thread

From: Dagfinn Ilmari Mannsåker @ 2025-11-20 18:02 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Dagfinn Ilmari Mannsåker <[email protected]> writes:

> Melanie Plageman <[email protected]> writes:
>
>> +			PruneFreezeParams params = {.relation = relation,.buffer = buffer,
>> +				.reason = PRUNE_ON_ACCESS,.options = 0,
>> +				.vistest = vistest,.cutoffs = NULL
>> +			};
>
> I didn't pay much attention to this thread, so I didn't notice this
> until it got committed, but I'd like to lodge an objection to this
> formatting, especially the lack of spaces before the field names. This
> would be much more readable with one struct field per line, i.e.
>
> 	PruneFreezeParams params = {
> 		.relation = rel,
>                 .buffer = buf,
> 		.reason = PRUNE_VACUUM_SCAN,
> 		.options = HEAP_PAGE_PRUNE_FREEZE,
> 		.vistest = vacrel->vistest,
> 		.cutoffs = &vacrel->cutoffs,
> 	};

D'oh, my mail client untabified the .buffer line while I was editing it,
that should of course be:

	PruneFreezeParams params = {
		.relation = rel,
		.buffer = buf,
		.reason = PRUNE_VACUUM_SCAN,
		.options = HEAP_PAGE_PRUNE_FREEZE,
		.vistest = vacrel->vistest,
		.cutoffs = &vacrel->cutoffs,
	};

- ilmari





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-11-20 22:23  Melanie Plageman <[email protected]>
  parent: Dagfinn Ilmari Mannsåker <[email protected]>
  1 sibling, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-11-20 22:23 UTC (permalink / raw)
  To: Dagfinn Ilmari Mannsåker <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Thu, Nov 20, 2025 at 12:55 PM Dagfinn Ilmari Mannsåker
<[email protected]> wrote:
>
> I didn't pay much attention to this thread, so I didn't notice this
> until it got committed, but I'd like to lodge an objection to this
> formatting, especially the lack of spaces before the field names. This
> would be much more readable with one struct field per line, i.e.
>
>         PruneFreezeParams params = {
>                 .relation = rel,
>                 .buffer = buf,
>                 .reason = PRUNE_VACUUM_SCAN,
>                 .options = HEAP_PAGE_PRUNE_FREEZE,
>                 .vistest = vacrel->vistest,
>                 .cutoffs = &vacrel->cutoffs,
>         };
>
> or at a pinch, if we're really being stingy with the vertical space:
>
>         PruneFreezeParams params = {
>                 .relation = rel, .buffer = buf,
>                 .reason = PRUNE_VACUUM_SCAN, .options = HEAP_PAGE_PRUNE_FREEZE,
>                 .vistest = vacrel->vistest, .cutoffs = &vacrel->cutoffs,
>         };
>
> I had a quick grep, and every other designated struct initialiser I
> could find uses the one-field-per-line form, but they're not consistent
> about the comma after the last field.  I personally prefer having it, so
> that one can add more fields later without having to modify the
> unrelated line.

pgindent doesn't allow for a space after the comma before the period.
One reason I used struct initialization was to save space, so I'm a
bit loath to put every member on its own line. However, I don't want
to make the code less readable to others. So, I will commit an update
as you request.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-11-21 01:09  Chao Li <[email protected]>
  parent: Melanie Plageman <[email protected]>
  2 siblings, 2 replies; 143+ messages in thread

From: Chao Li @ 2025-11-21 01:09 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>



> On Nov 21, 2025, at 01:19, Melanie Plageman <[email protected]> wrote:
> 
> On Wed, Nov 19, 2025 at 6:13 PM Melanie Plageman
> <[email protected]> wrote:
>> 
>> Since it is passed into one of the helpers, I think I agree. Attached
>> v21 has this change.
> 
> I've committed the first three patches. Attached v22 is the remaining
> patches which set the VM in heap_page_prune_and_freeze() for vacuum
> and then allow on-access pruning to also set the VM.
> 

I just started reviewing 0001 yesterday and got a few comments. However, it was late, I didn’t have enough time to wrap up, so I decided to review a few more today and send the comments together. As you have pushed 0001-0003, I’d still raise my comment for them now, and I will review the rest of commits next week.

1 - pushed 0001
```
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -419,60 +425,44 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
- * required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
- *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
  *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
- * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
- * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
+ * tuples if it's required in order to advance relfrozenxid / relminmxid, or
+ * if it's considered advantageous for overall system performance to do so
+ * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
+ * 'new_relmin_mxid' arguments are required when freezing.  When
+ * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
+ * and presult->all_frozen on exit, to indicate if the VM bits can be set.
+ * They are always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not
+ * passed, because at the moment only callers that also freeze need that
+ * information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
  * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set.  On entry, they contain the oldest XID and
- * multi-XID seen on the relation so far.  They will be updated with oldest
- * values present on the page after pruning.  After processing the whole
- * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
- * for the relation.
+ * HEAP_PAGE_PRUNE_FREEZE option is set in params.  On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far.  They will be updated
+ * with oldest values present on the page after pruning.  After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
  */
 void
-heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-						   GlobalVisState *vistest,
-						   int options,
-						   struct VacuumCutoffs *cutoffs,
+heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   PruneFreezeResult *presult,
-						   PruneReason reason,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
 {
```

For this function interface change, I got a concern. The old function comment says "cutoffs contains the freeze cutoffs …. Required if HEAP_PRUNE_FREEZE option is set.”, meaning that cutoffs is only useful and must be set when HEAP_PRUNE_FREEZE is set. But the new comment seems to have lost this indication.

And in the old function interface, cutoffs sat right next to options, readers are easy to notice:

* when options is 0, cutoffs is null
```
			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
```

* when options has HEAP_PAGE_PRUNE_FREEZE, cutoffs is passed in
```
	prune_options = HEAP_PAGE_PRUNE_FREEZE;
	if (vacrel->nindexes == 0)
		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;

	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
							   &vacrel->offnum,
							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
```

So, the change doesn’t break anything, but makes code a little bit harder to read. So, my suggestion is to add an assert in heap_page_prune_and_freeze, something like:

```
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs != NULL);
```

2 - pushed 0001
```
+	PruneFreezeParams params = {.relation = rel,.buffer = buf,
+		.reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE,
+		.cutoffs = &vacrel->cutoffs,.vistest = vacrel->vistest
+	};
```

Using a designated initializer is not wrong, but makes future maintenance harder, because when a new field is added, this initializer will leave the new field uninitiated. From my impression, I don’t remember I ever see a designated initializer in PG code. I only remember 3 ways I have seen:

* use an initialize function to set every fields individually
* palloc0 to set all 0, then set non-zero fields individually
* {0} to set all 0, then set non-zero fields individually

3 - pushed 0002
```
 					prstate->all_visible = false;
+					prstate->all_frozen = false;
```

Nit: Now setting the both fields to false repeat in 6 places. Maybe add a static inline function, say PruneClearVisibilityFlags(), may improve maintainability.

4 - pushed 0003
```
+ * opporunistically freeze, to indicate if the VM bits can be set.  They are
```

Typo: opporunistically, missed a “t”.

I’d stop here today, and continue reviewing rest commits in next week.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-11-24 08:07  Chao Li <[email protected]>
  parent: Chao Li <[email protected]>
  1 sibling, 2 replies; 143+ messages in thread

From: Chao Li @ 2025-11-24 08:07 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>



> On Nov 21, 2025, at 09:09, Chao Li <[email protected]> wrote:
> 
> I’d stop here today, and continue reviewing rest commits in next week.

I continue reviewing today.

0004 This a pure refactoring. It splits heap_page_prune_and_freeze to multiple small functions. LGTM, no comment.

0005 overall good, a few nit comments as below.

0006, 0007 look good, no comment.

5 - 0005 - heapam.h
```
+	/*
+	 *
+	 * vmbuffer is the buffer that must already contain contain the required
+	 * block of the visibility map if we are to update it. blk_known_av is the
```

Nit: 

* an unnecessary empty comment line.
* “contain contain” => “contain" 

6 - 0005 heapam_xlog.c
```
+		 * The critical integrity requirement here is that we must never end
+		 * up with with the visibility map bit set and the page-level
```

Nit: “with with” => “with”

I will continue reviewing 0008 and rest tomorrow.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-11-24 09:31  Chao Li <[email protected]>
  parent: Chao Li <[email protected]>
  1 sibling, 0 replies; 143+ messages in thread

From: Chao Li @ 2025-11-24 09:31 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>



> On Nov 24, 2025, at 16:07, Chao Li <[email protected]> wrote:
> 
> 0006, 0007 look good, no comment.

I missed a nit comment in 0007:

7 - 0007
```
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
```

VISIBLITYMAP_XLOG_CATALOG_REL missed “I” after “B”.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-11-24 22:24  Andres Freund <[email protected]>
  parent: Melanie Plageman <[email protected]>
  2 siblings, 1 reply; 143+ messages in thread

From: Andres Freund @ 2025-11-24 22:24 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hi,

On 2025-11-20 12:19:58 -0500, Melanie Plageman wrote:
> From 363f0e4ac9ac7699a6d9c2a267a2ad60825407c8 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Mon, 17 Nov 2025 15:11:27 -0500
> Subject: [PATCH v22 1/9] Split heap_page_prune_and_freeze() into helpers
>
> Refactor the setup and planning phases of pruning and freezing into
> helpers. This streamlines heap_page_prune_and_freeze() and makes it more
> clear when the examination of tuples ends and page modifications begin.

I think this is a considerable improvement.

I didn't review this with a lot of detail, given that it's mostly moving
code.

One minor thing: It's slightly odd that prune_freeze_plan() gets an oid
argument, prune_freeze_setup() gets the entire prstate,
heap_page_will_freeze() gets the Relation. It's what they need, but still a
bit odd.


FWIW, I found the diff generated by
  git show --diff-algorithm=minimal --color-moved-ws=allow-indentation-change

useful for viewing this diff, showed much more clearly how little the code
actually changed.



> From 8ebaf434af5afaebcf71550116c59355b3bf15c1 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 8 Oct 2025 15:39:01 -0400
> Subject: [PATCH v22 2/9] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
>  prune/freeze
>
> Vacuum no longer emits a separate WAL record for each page set
> all-visible or all-frozen during phase I. Instead, visibility map
> updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
> is already emitted for pruning and freezing.
>
> Previously, heap_page_prune_and_freeze() determined whether a page was
> all-visible, but the corresponding VM bits were only set later in
> lazy_scan_prune(). Now the VM is updated immediately in
> heap_page_prune_and_freeze(), at the same time as the heap
> modifications.
>
> This change applies only to vacuum phase I, not to pruning performed
> during normal page access.

Hm. This change makes sense, but unfortunately I find it somewhat hard to
review. There are a lot of changes that don't obviously work towards one
goal in this commit.

>@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
>     Relation    relation;       /* relation containing buffer to be pruned */
>     Buffer      buffer;         /* buffer to be pruned */
> 
>+    /*
>+     *
>+     * vmbuffer is the buffer that must already contain contain the required
>+     * block of the visibility map if we are to update it. blk_known_av is the
>+     * visibility status of the heap block as of the last call to
>+     * find_next_unskippable_block().
>+     */
>+    Buffer      vmbuffer;
>+    bool        blk_known_av;
>+
>     /*
>      * The reason pruning was performed.  It is used to set the WAL record
>      * opcode which is used for debugging and analysis purposes.

What is blk_known_av set to if the block is known to not be all visible?
Compared to the case where we did not yet determine the visibility status of
the block?


>@@ -250,8 +261,10 @@ typedef struct PruneFreezeParams
>      * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
>      * LP_UNUSED during pruning.
>      *
>-     * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
>-     * will return 'all_visible', 'all_frozen' flags to the caller.
>+     * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
>+     *
>+     * HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status
>+     * in the VM.
>      */
>     int         options;

nit^2: The previous version and the other paragraphs end in a .


> @@ -157,17 +159,36 @@ heap_xlog_prune_freeze(XLogReaderState *record)
>  		/* There should be no more data */
>  		Assert((char *) frz_offsets == dataptr + datalen);
>
> -		if (vmflags & VISIBILITYMAP_VALID_BITS)
> -			PageSetAllVisible(page);
> -
> -		MarkBufferDirty(buffer);
> +		if (do_prune || nplans > 0)
> +			mark_buffer_dirty = set_lsn = true;
>
>  		/*
> -		 * See log_heap_prune_and_freeze() for commentary on when we set the
> -		 * heap page LSN.
> +		 * The critical integrity requirement here is that we must never end
> +		 * up with with the visibility map bit set and the page-level
> +		 * PD_ALL_VISIBLE bit clear.  If that were to occur, a subsequent page

s/clear/unset/ would be a tad clearer.


> +		 * modification would fail to clear the visibility map bit.
> +		 *
> +		 * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
> +		 * marking an all-visible page all-frozen). If only the VM is updated,
> +		 * the heap page need not be dirtied.
>  		 */
> -		if (do_prune || nplans > 0 ||
> -			((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
> +		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
> +		{
> +			PageSetAllVisible(page);
> +			mark_buffer_dirty = true;
> +
> +			/*
> +			 * See log_heap_prune_and_freeze() for commentary on when we set
> +			 * the heap page LSN.
> +			 */
> +			if (XLogHintBitIsNeeded())
> +				set_lsn = true;
> +		}

Maybe worth adding something like Assert(!set_lsn || mark_buffer_dirty)?


> +/*
> + * Decide whether to set the visibility map bits for heap_blk, using
> + * information from PruneState and blk_known_av. Some callers may already
> + * have examined this page’s VM bits (e.g., VACUUM in the previous
> + * heap_vac_scan_next_block() call) and can pass that along.

That's not entirely trivial to follow, tbh. As mentioned above, it's not clear
to me how the state of a block where did determine that the block is *not*
all-visible is represented.


> + * Returns true if one or both VM bits should be set, along with the desired
> + * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
> + * should be set on the heap page.
> + */
> +static bool
> +heap_page_will_set_vis(Relation relation,
> +					   BlockNumber heap_blk,
> +					   Buffer heap_buf,
> +					   Buffer vmbuffer,
> +					   bool blk_known_av,
> +					   const PruneState *prstate,
> +					   uint8 *vmflags,
> +					   bool *do_set_pd_vis)
> +{
> +	Page		heap_page = BufferGetPage(heap_buf);
> +	bool		do_set_vm = false;
> +
> +	*do_set_pd_vis = false;
> +
> +
> +	/*
> +	 * Now handle two potential corruption cases:
> +	 *
> +	 * These do not need to happen in a critical section and are not
> +	 * WAL-logged.
> +	 *
> +	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
> +	 * page-level bit is clear.  However, it's possible that in vacuum the bit
> +	 * got cleared after heap_vac_scan_next_block() was called, so we must
> +	 * recheck with buffer lock before concluding that the VM is corrupt.
> +	 */
> +	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
> +			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
> +	{
> +		ereport(WARNING,
> +				(errcode(ERRCODE_DATA_CORRUPTED),
> +				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
> +						RelationGetRelationName(relation), heap_blk)));
> +
> +		visibilitymap_clear(relation, heap_blk, vmbuffer,
> +							VISIBILITYMAP_VALID_BITS);

Wait, why is it ok to perform this check iff blk_known_av is set?


> +			old_vmbits = visibilitymap_set_vmbits(blockno,
> +												  vmbuffer, new_vmbits,
> +												  params->relation->rd_locator);
> +			if (old_vmbits == new_vmbits)
> +			{
> +				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
> +				/* Unset so we don't emit WAL since no change occurred */
> +				do_set_vm = false;
> +			}
> +		}

What can lead to this path being reached? Doesn't this mean that something
changed the state of the VM while we were holding an exclusive lock on the
heap buffer?


> +		/*
> +		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
> +		 * only updating the VM and it turns out it was already set, we will
> +		 * have unset do_set_vm earlier. As such, check it again before
> +		 * emitting the record.
> +		 */
> +		if (RelationNeedsWAL(params->relation) &&
> +			(do_prune || do_freeze || do_set_vm))
> +		{
>  			log_heap_prune_and_freeze(params->relation, buffer,
> -									  InvalidBuffer,	/* vmbuffer */
> -									  0,	/* vmflags */
> +									  do_set_vm ? vmbuffer : InvalidBuffer,
> +									  do_set_vm ? new_vmbits : 0,
>  									  conflict_xid,
> -									  true, params->reason,
> +									  true, /* cleanup lock */
> +									  do_set_pd_vis,
> +									  params->reason,
>  									  prstate.frozen, prstate.nfrozen,
>  									  prstate.redirected, prstate.nredirected,
>  									  prstate.nowdead, prstate.ndead,

This function is now taking 16 parameters :/


> @@ -959,28 +1148,47 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>
>  	END_CRIT_SECTION();
>
> +	if (do_set_vm)
> +		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
> +
> +	/*
> +	 * During its second pass over the heap, VACUUM calls
> +	 * heap_page_would_be_all_visible() to determine whether a page is
> +	 * all-visible and all-frozen. The logic here is similar. After completing
> +	 * pruning and freezing, use an assertion to verify that our results
> +	 * remain consistent with heap_page_would_be_all_visible().
> +	 */
> +#ifdef USE_ASSERT_CHECKING
> +	if (prstate.all_visible)
> +	{
> +		TransactionId debug_cutoff;
> +		bool		debug_all_frozen;
> +
> +		Assert(prstate.lpdead_items == 0);
> +		Assert(prstate.cutoffs);
> +
> +		if (!heap_page_is_all_visible(params->relation, buffer,
> +									  prstate.cutoffs->OldestXmin,
> +									  &debug_all_frozen,
> +									  &debug_cutoff, off_loc))
> +			Assert(false);

I don't love Assert(false), because the message for the assert failure is
pretty much meaningless. Sometimes it's hard to avoid, but here you have an if
() that has no body other than Assert(false)? Just Assert the expression
directly.


> From 34f0009570e117d7d48b560cd097ee25c6cdcc7c Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Sat, 27 Sep 2025 11:55:21 -0400
> Subject: [PATCH v22 3/9] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
>
> As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
> marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

This whole business of treating empty pages as all-visible continues to not
make any sense to me. Particularly in combination with a not crashsafe FSM it
just seems ... unhelpful. It also means that there there's a decent chance of
extra WAL when bulk extending. But that's not the fault of this change.


> From 0d6a06d4533cfe153440d301c3d20915ba07892f Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Sat, 27 Sep 2025 11:55:36 -0400
> Subject: [PATCH v22 4/9] Remove XLOG_HEAP2_VISIBLE entirely
>
> As no remaining users emit XLOG_HEAP2_VISIBLE records.
> This includes deleting the xl_heap_visible struct and all functions
> responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

Probably worth mentioning that this changes the VM API.


> @@ -2396,14 +2396,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
>   *
>   * This is used for several different page maintenance operations:
>   *
> - * - Page pruning, in VACUUM's 1st pass or on access: Some items are
> + * - Page pruning, in vacuum phase I or on-access: Some items are
>   *   redirected, some marked dead, and some removed altogether.
>   *
> - * - Freezing: Items are marked as 'frozen'.
> + * - Freezing: During vacuum phase I, items are marked as 'frozen'
>   *
> - * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
> + * - Reaping: During vacuum phase III, items that are already LP_DEAD are
> + *   marked as unused.
>   *
> - * They have enough commonalities that we use a single WAL record for them
> + * - VM updates: After vacuum phases I and III, the heap page may be marked
> + *   all-visible and all-frozen.
> + *
> + * These changes all happen together, so we use a single WAL record for them
>   * all.
>   *
>   * If replaying the record requires a cleanup lock, pass cleanup_lock =
>   true.

How's that related to the commit's subject?


> From fd0455230968fd919999a5c035f3830d310f0e49 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Fri, 18 Jul 2025 16:30:04 -0400
> Subject: [PATCH v22 5/9] Rename GlobalVisTestIsRemovableXid() to
>  GlobalVisXidVisibleToAll()
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> The function is currently only used to check whether a tuple’s xmax is
> visible to all transactions (and thus removable). Upcoming changes will
> also use it to test whether a tuple’s xmin is visible to all to
> decide if a page can be marked all-visible in the visibility map.
>
> The new name, GlobalVisXidVisibleToAll(), better reflects this broader
> purpose.

If we want this - and I'm not convinced we do - I think it needs to go further
and change the other uses of removable in
procarray.c. ComputeXidHorizonsResult has a lot of related fields.

There's also GetOldestNonRemovableTransactionId(),
GlobalVisCheckRemovableXid(), GlobalVisCheckRemovableFullXid() that weren't
included in the renaming.


> From 565014e31aa117fb43993ee2e64da38ffb74f372 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Tue, 29 Jul 2025 14:38:24 -0400
> Subject: [PATCH v22 6/9] Use GlobalVisState in vacuum to determine page level
>  visibility
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> During vacuum's first and third phases, we examine tuples' visibility
> to determine if we can set the page all-visible in the visibility map.
>
> Previously, this check compared tuple xmins against a single XID chosen at
> the start of vacuum (OldestXmin). We now use GlobalVisState, which also
> enables future work to set the VM during on-access pruning, since ordinary
> queries have access to GlobalVisState but not OldestXmin.
>
> This also benefits vacuum directly: in some cases, GlobalVisState may
> advance during a vacuum, allowing more pages to become considered
> all-visible. And, in the future, we could easily add a heuristic to
> update GlobalVisState more frequently during vacuums of large tables. In
> the rare case that the GlobalVisState moves backward, vacuum falls back
> to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
> wasn’t yet prunable according to the GlobalVisState.

I think it may be better to make sure that the GlobalVisState can't move
backward.


> From bced81f6df3d303679fac2a1414d42f0db401232 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Tue, 29 Jul 2025 14:34:30 -0400
> Subject: [PATCH v22 8/9] Allow on-access pruning to set pages all-visible
>
> Many queries do not modify the underlying relation. For such queries, if
> on-access pruning occurs during the scan, we can check whether the page
> has become all-visible and update the visibility map accordingly.
> Previously, only vacuum and COPY FREEZE marked pages as all-visible or
> all-frozen.

> Supporting this requires passing information about whether the relation
> is modified from the executor down to the scan descriptor.

I think it'd be good to split this part into a separate commit. The set of
folks to review that are distinct (and broader) from the ones looking at
heapam internals.


Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-11-25 21:43  Melanie Plageman <[email protected]>
  parent: Chao Li <[email protected]>
  1 sibling, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-11-25 21:43 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Thanks for the review!

On Thu, Nov 20, 2025 at 8:10 PM Chao Li <[email protected]> wrote:
>
>   * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
> - * HEAP_PRUNE_FREEZE option is set.  On entry, they contain the oldest XID and
> - * multi-XID seen on the relation so far.  They will be updated with oldest
> - * values present on the page after pruning.  After processing the whole
> - * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
> - * for the relation.
> + * HEAP_PAGE_PRUNE_FREEZE option is set in params.  On entry, they contain the
> + * oldest XID and multi-XID seen on the relation so far.  They will be updated
> + * with oldest values present on the page after pruning.  After processing the
> + * whole relation, VACUUM can use these values as the new
> + * relfrozenxid/relminmxid for the relation.
>   */
>  void
> -heap_page_prune_and_freeze(Relation relation, Buffer buffer,
> -                                                  GlobalVisState *vistest,
> -                                                  int options,
> -                                                  struct VacuumCutoffs *cutoffs,
> +heap_page_prune_and_freeze(PruneFreezeParams *params,
>                                                    PruneFreezeResult *presult,
> -                                                  PruneReason reason,
>                                                    OffsetNumber *off_loc,
>                                                    TransactionId *new_relfrozen_xid,
>                                                    MultiXactId *new_relmin_mxid)
>  {
> ```
>
> For this function interface change, I got a concern. The old function comment says "cutoffs contains the freeze cutoffs …. Required if HEAP_PRUNE_FREEZE option is set.”, meaning that cutoffs is only useful and must be set when HEAP_PRUNE_FREEZE is set. But the new comment seems to have lost this indication.

I did move that comment into the PruneFreezeParams struct definition.

> And in the old function interface, cutoffs sat right next to options, readers are easy to notice:
>
> * when options is 0, cutoffs is null
> ```
>                         heap_page_prune_and_freeze(relation, buffer, vistest, 0,
>                                                                            NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
> ```
>
> * when options has HEAP_PAGE_PRUNE_FREEZE, cutoffs is passed in
> ```
>         prune_options = HEAP_PAGE_PRUNE_FREEZE;
>         if (vacrel->nindexes == 0)
>                 prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
>
>         heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
>                                                            &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
>                                                            &vacrel->offnum,
>                                                            &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
> ```
>
> So, the change doesn’t break anything, but makes code a little bit harder to read. So, my suggestion is to add an assert in heap_page_prune_and_freeze, something like:
>
> Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs != NULL);

That's fair. I've gone ahead and pushed a commit with your suggested assert.

> 2 - pushed 0001
> ```
> +       PruneFreezeParams params = {.relation = rel,.buffer = buf,
> +               .reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE,
> +               .cutoffs = &vacrel->cutoffs,.vistest = vacrel->vistest
> +       };
> ```
>
> Using a designated initializer is not wrong, but makes future maintenance harder, because when a new field is added, this initializer will leave the new field uninitiated. From my impression, I don’t remember I ever see a designated initializer in PG code. I only remember 3 ways I have seen:
>
> * use an initialize function to set every fields individually
> * palloc0 to set all 0, then set non-zero fields individually
> * {0} to set all 0, then set non-zero fields individually

Well, the main reason you don't see them much in the code is that a
lot of the code is old and we didn't require a c99-compliant compiler
until fairly recently (okay like 2018/2019) -- and thus couldn't use
designated initializers.

I agree that they are rare for structs (they are quite commonly used
with arrays), but they are there -- for example these bufmgr init
macros

#define BMR_REL(p_rel) \
    ((BufferManagerRelation){.rel = p_rel})
#define BMR_SMGR(p_smgr, p_relpersistence) \
    ((BufferManagerRelation){.smgr = p_smgr, .relpersistence =
p_relpersistence})
#define BMR_GET_SMGR(bmr) \
    (RelationIsValid((bmr).rel) ? RelationGetSmgr((bmr).rel) : (bmr).smgr)

I don't see how it would be harder to remember to initialize a field
with a designated initializer vs if you have to just remember to add a
line initializing that field in the code. And using a designated
initializer ensures all unspecified fields will be zeroed out.

In general, I have seen threads [1] encouraging the use of designated
initializers, so I'm inclined to leave it as is since it is committed,
and I haven't heard other pushback.

> 3 - pushed 0002
> ```
>                                         prstate->all_visible = false;
> +                                       prstate->all_frozen = false;
> ```
>
> Nit: Now setting the both fields to false repeat in 6 places. Maybe add a static inline function, say PruneClearVisibilityFlags(), may improve maintainability.

I see your point. However, I don't think it would necessarily be an
improvement. This function already has a lot of helpers you have to
jump to to understand what's going on. And in the place where they are
cleared most often, heap_prune_record_unchanged_lp_normal(), we set
other fields of the prstate directly, so it is nice visual symmetry in
my opinion to set them inline.

I did want to use chained assignment (all_visible = all_frozen =
false), but I have had people complain about that in my code before
more than once, so I refrained.

> 4 - pushed 0003
> ```
> + * opporunistically freeze, to indicate if the VM bits can be set.  They are
> ```
>
> Typo: opporunistically, missed a “t”.

Fixed in same commit that added the assert.

- Melanie

[1] https://www.postgresql.org/message-id/flat/5B873BED.9080501%40anastigmatix.net#4a067c1314783f0e171b4...





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-03 23:07  Melanie Plageman <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 2 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-12-03 23:07 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Chao Li <[email protected]>

Thanks for the review! All the small changes you suggested I made in
attached v23 unless otherwise noted below.

On Mon, Nov 24, 2025 at 5:24 PM Andres Freund <[email protected]> wrote:
>
> On 2025-11-20 12:19:58 -0500, Melanie Plageman wrote:
> > Subject: [PATCH v22 1/9] Split heap_page_prune_and_freeze() into helpers
>
> One minor thing: It's slightly odd that prune_freeze_plan() gets an oid
> argument, prune_freeze_setup() gets the entire prstate,
> heap_page_will_freeze() gets the Relation. It's what they need, but still a
> bit odd.

They all get the PruneState actually.

I've committed this patch (but actually have to do a follow-on commit
to silence coverity. Will do that next.)

> > Subject: [PATCH v22 2/9] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
> >  prune/freeze
>
>
> Hm. This change makes sense, but unfortunately I find it somewhat hard to
> review. There are a lot of changes that don't obviously work towards one
> goal in this commit.

I've split up the first commit into 4 patches in attached v23
(0002-0005). They are not meant to be committed separately but are
separate only for ease of review. They comprise the logical steps for
getting to the final code state. I originally had it split up but got
feedback it was more work to review them each. So, let's see how this
goes.

> >@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
>
> >+     * vmbuffer is the buffer that must already contain contain the required
> >+     * block of the visibility map if we are to update it. blk_known_av is the
> >+     * visibility status of the heap block as of the last call to
> >+     * find_next_unskippable_block().
> >+     */
> >+    Buffer      vmbuffer;
> >+    bool        blk_known_av;
>
> What is blk_known_av set to if the block is known to not be all visible?
> Compared to the case where we did not yet determine the visibility status of
> the block?

blk_known_av should always be set to false if the caller doesn't know.
It is used as an optimization. I've added to the comment in this
struct to clarify that. More on this further down in my mail.

> > + * Decide whether to set the visibility map bits for heap_blk, using
> > + * information from PruneState and blk_known_av. Some callers may already
> > + * have examined this page’s VM bits (e.g., VACUUM in the previous
> > + * heap_vac_scan_next_block() call) and can pass that along.
>
> That's not entirely trivial to follow, tbh. As mentioned above, it's not clear
> to me how the state of a block where did determine that the block is *not*
> all-visible is represented.

There is no need to distinguish between knowing it is not all-visible
and not knowing if it is all-visible. That is, "not known" and "known
not" are the same for our purposes. This is only an optimization and
not needed for correctness. I've tried to add comments to this effect
in various places where blk_known_av is used.

> > +     else if (blk_known_av && !PageIsAllVisible(heap_page) &&
> > +                      visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
> > +     {
> > +             ereport(WARNING,
> > +                             (errcode(ERRCODE_DATA_CORRUPTED),
> > +                              errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
> > +                                             RelationGetRelationName(relation), heap_blk)));
> > +
> > +             visibilitymap_clear(relation, heap_blk, vmbuffer,
> > +                                                     VISIBILITYMAP_VALID_BITS);
>
> Wait, why is it ok to perform this check iff blk_known_av is set?

This is existing logic in vacuum. It would be okay to perform the
check even if blk_known_av is false but might be too expensive for the
common case where the page is not all-visible (especially on-access).
The next vacuum should be able to enter this code path and fix it. Or
do you think it will be cheap enough because the caller will have read
in and pinned the VM page?

> > +                     old_vmbits = visibilitymap_set_vmbits(blockno,
> > +                                                                                               vmbuffer, new_vmbits,
> > +                                                                                               params->relation->rd_locator);
> > +                     if (old_vmbits == new_vmbits)
> > +                     {
> > +                             LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
> > +                             /* Unset so we don't emit WAL since no change occurred */
> > +                             do_set_vm = false;
> > +                     }
> > +             }
>
> What can lead to this path being reached? Doesn't this mean that something
> changed the state of the VM while we were holding an exclusive lock on the
> heap buffer?

This shouldn't be in this commit (I've fixed that). However, it is
needed once we have on-access VM setting because we could have set the
page all-visible in the VM on-access in between when
find_next_unskippable_block() first checks the VM and sets
all_visible_according_to_vm/blk_known_av and when we take the lock and
prune/freeze the page.

> >                       log_heap_prune_and_freeze(params->relation, buffer,
> > -                                                                       InvalidBuffer,        /* vmbuffer */
> > -                                                                       0,    /* vmflags */
> > +                                                                       do_set_vm ? vmbuffer : InvalidBuffer,
> > +                                                                       do_set_vm ? new_vmbits : 0,
> >                                                                         conflict_xid,
> > -                                                                       true, params->reason,
> > +                                                                       true, /* cleanup lock */
> > +                                                                       do_set_pd_vis,
> > +                                                                       params->reason,
> >                                                                         prstate.frozen, prstate.nfrozen,
> >                                                                         prstate.redirected, prstate.nredirected,
> >                                                                         prstate.nowdead, prstate.ndead,
>
> This function is now taking 16 parameters :/

Is this complaint about readability or performance of parameter
passing? Because if it's the latter, I can't imagine that will be
noticeable when compared to the overhead of emitting a WAL record.

I could add a struct just for passing the parameters to the
log_heap_prune_and_freeze(). Something like:

typedef struct PruneFreezeChanges
{
    int            nredirected;
    int            ndead;
    int            nunused;
    int            nfrozen;
    OffsetNumber *redirected;
    OffsetNumber *nowdead;
    OffsetNumber *nowunused;
    HeapTupleFreeze *frozen;
} PruneFreezeChanges;

PruneFreezeChanges c = {
        .redirected = prstate.redirected,
        .nredirected = prstate.nredirected,
        .ndead = prstate.ndead,
        .nowdead = prstate.nowdead,
        .nunused = prstate.nunused,
        .nowunused = prstate.nowunused,
        .nfrozen = prstate.nfrozen,
        .frozen = prstate.frozen,
};

log_heap_prune_and_freeze(params->relation, buffer,
                                                        InvalidBuffer,
   /* vmbuffer */
                                                        0,    /* vmflags */
                                                        conflict_xid,
                                                        true, params->reason,
                                                        c);

However, I fear it is a bit confusing to have this struct just to pass
the parameters to the log_heap_prune_and_freeze(). We can't use that
struct inline in the PruneState because then we would need all the
arrays to be inline in the PruneFreezeChanges struct which would cause
4*MaxHeapTuplesPerPage stack allocated OffsetNumbers in vacuum phase
III than it currently has and needs.

The only other related parameters I see that could be stuck into a
struct are vmflags and set_pd_all_vis -- maybe called VisiChanges or
HeapPageVisiChanges. But again, I'm not sure if it is worth adding a
new struct for this.

> > +#ifdef USE_ASSERT_CHECKING
> > +     if (prstate.all_visible)
> > +     {
> > +             TransactionId debug_cutoff;
> > +             bool            debug_all_frozen;
> > +
> > +             Assert(prstate.lpdead_items == 0);
> > +             Assert(prstate.cutoffs);
> > +
> > +             if (!heap_page_is_all_visible(params->relation, buffer,
> > +                                                                       prstate.cutoffs->OldestXmin,
> > +                                                                       &debug_all_frozen,
> > +                                                                       &debug_cutoff, off_loc))
> > +                     Assert(false);
>
> I don't love Assert(false), because the message for the assert failure is
> pretty much meaningless. Sometimes it's hard to avoid, but here you have an if
> () that has no body other than Assert(false)? Just Assert the expression
> directly.

This is existing code. I agree it's weird, but I remember Peter saying
something about why he did it this way that I no longer remember.
Anyway, 0001 changes the assert as you suggest.

> > Subject: [PATCH v22 3/9] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
> >
> > As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
> > marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
>
> This whole business of treating empty pages as all-visible continues to not
> make any sense to me. Particularly in combination with a not crashsafe FSM it
> just seems ... unhelpful. It also means that there there's a decent chance of
> extra WAL when bulk extending. But that's not the fault of this change.

Is the argument for setting them av/af that we can skip them more
easily in future vacuums (i.e. not have to read in the page and take a
lock etc)?

> > Subject: [PATCH v22 4/9] Remove XLOG_HEAP2_VISIBLE entirely
> >
> > As no remaining users emit XLOG_HEAP2_VISIBLE records.
> > This includes deleting the xl_heap_visible struct and all functions
> > responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
>
> Probably worth mentioning that this changes the VM API.

I've added a mention about this in the commit.
Are you imagining I have any comments anywhere about how
XLOG_HEAP2_VISIBLE used to exist?

I realized I need to bump XLOG_PAGE_MAGIC in this commit because the
code to replay XLOG_HEAP2_VISIBLE records is gone now.

What I'm not sure is if I have to bump it in some of the other commits
that change which WAL records are emitted by a particular operation
(e.g. not emitting a separate VM record from phase I of vacuum).

> > - * - Page pruning, in VACUUM's 1st pass or on access: Some items are
> > + * - Page pruning, in vacuum phase I or on-access: Some items are
> >   *   redirected, some marked dead, and some removed altogether.
> >   *
> > - * - Freezing: Items are marked as 'frozen'.
> > + * - Freezing: During vacuum phase I, items are marked as 'frozen'
> >   *
> > - * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
> > + * - Reaping: During vacuum phase III, items that are already LP_DEAD are
> > + *   marked as unused.
> >   *
> > - * They have enough commonalities that we use a single WAL record for them
> > + * - VM updates: After vacuum phases I and III, the heap page may be marked
> > + *   all-visible and all-frozen.
> > + *
> > + * These changes all happen together, so we use a single WAL record for them
> >   * all.
> >   *
> >   * If replaying the record requires a cleanup lock, pass cleanup_lock =
> >   true.
>
> How's that related to the commit's subject?

Oops, I moved it to the relevant commit.

> > Subject: [PATCH v22 5/9] Rename GlobalVisTestIsRemovableXid() to
> >  GlobalVisXidVisibleToAll()
> >
> > The function is currently only used to check whether a tuple’s xmax is
> > visible to all transactions (and thus removable). Upcoming changes will
> > also use it to test whether a tuple’s xmin is visible to all to
> > decide if a page can be marked all-visible in the visibility map.
> >
> > The new name, GlobalVisXidVisibleToAll(), better reflects this broader
> > purpose.
>
> If we want this - and I'm not convinced we do - I think it needs to go further
> and change the other uses of removable in
> procarray.c. ComputeXidHorizonsResult has a lot of related fields.
>
> There's also GetOldestNonRemovableTransactionId(),
> GlobalVisCheckRemovableXid(), GlobalVisCheckRemovableFullXid() that weren't
> included in the renaming.

Okay, I see what you are saying. When you say you're not sure if we
want "this" -- do you mean using GlobalVisState for determining if
xmins are visible to all (which is required to set the VM on-access)
or do you mean renaming those functions?

If we're just talking about the renaming, looking at procarray.c, it
is full of the word "removable" because its functions were largely
used to examine and determine if everyone can see an xmax as committed
and thus if that tuple is removable from their perspective. But
nothing about the code that I can see means it has to be an xmax. We
could just as well use the functions to determine if everyone can see
an xmin as committed.

I don't see how we can leave the names as is and use it on xmins
because that tuple is _not_ removable based on testing if everyone can
see the xmin. So the function basically returns an incorrect result.

That being said, the problem with replacing "removable" with "visible
to all" -- which isn't _terrible_ -- means we have to replace
"nonremovable" with "not visible to all" -- which is terrible.

I think getting rid of "removable" from procarray.c would be an
improvement because that file feels tightly coupled to vacuum and
removing tuples because of the names of variables and functions when
actually its functionality isn't. So, the issue is coming up with
something palatable.

One alternative idea (that requires no renaming) is to add a wrapper
function somewhere outside procarray.c which invokes
GlobalVisTestIsRemovableXid() but is called something like
XidVisibleToAll() and is documented for use with xmins/etc. It would
avoid the messy work of coming up with a good name. What do you think?

> > Subject: [PATCH v22 6/9] Use GlobalVisState in vacuum to determine page level
> >  visibility
> >
> > This also benefits vacuum directly: in some cases, GlobalVisState may
> > advance during a vacuum, allowing more pages to become considered
> > all-visible. And, in the future, we could easily add a heuristic to
> > update GlobalVisState more frequently during vacuums of large tables. In
> > the rare case that the GlobalVisState moves backward, vacuum falls back
> > to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
> > wasn’t yet prunable according to the GlobalVisState.
>
> I think it may be better to make sure that the GlobalVisState can't move
> backward.

Do you mean that I shouldn't use the GlobalVisState to determine
visibility until I make sure it can't move backwards?

There is actually no functional difference in my patch set with the
code this commit message refers to (in heap_prune_satisfies_vacuum()).
I only mentioned it to make sure folks knew that even though I was
widening usage of GlobalVisState, we wouldn't encounter
synchronization issues with freezing horizons.

> > Subject: [PATCH v22 8/9] Allow on-access pruning to set pages all-visible
> >
> > Many queries do not modify the underlying relation. For such queries, if
> > on-access pruning occurs during the scan, we can check whether the page
> > has become all-visible and update the visibility map accordingly.
> > Previously, only vacuum and COPY FREEZE marked pages as all-visible or
> > all-frozen.
>
> > Supporting this requires passing information about whether the relation
> > is modified from the executor down to the scan descriptor.
>
> I think it'd be good to split this part into a separate commit. The set of
> folks to review that are distinct (and broader) from the ones looking at
> heapam internals.

Good point. I've split it into 3 commits in this patch set (0011-0013)

- Melanie


Attachments:

  [text/x-patch] v23-0001-Simplify-vacuum-visibility-assertion.patch (1.4K, 2-v23-0001-Simplify-vacuum-visibility-assertion.patch)
  download | inline diff:
From 7d51aaf9fea35367e36d143828412727a44d63d6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 10:42:53 -0500
Subject: [PATCH v23 01/14] Simplify vacuum visibility assertion

Phase I vacuum gives the page a once-over after pruning and freezing to
check that the values of all_visible and all_frozen agree with the
result of heap_page_is_all_visible(). This is meant to keep the logic in
phase I for determining visibility in sync with the logic in phase III.

Rewrite the assertion to avoid an Assert(false).

Suggested by Andres Freund.
---
 src/backend/access/heap/vacuumlazy.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 65bb0568a86..984d5879947 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2028,10 +2028,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
+		Assert(heap_page_is_all_visible(vacrel->rel, buf,
+										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.all_frozen == debug_all_frozen);
 
-- 
2.43.0



  [text/x-patch] v23-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patch (12.3K, 3-v23-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patch)
  download | inline diff:
From 7023583962f987cfde5450c8a2142574bb3ce84d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v23 02/14] Refactor lazy_scan_prune() VM set logic into helper

This commit is meant for ease of review only. It is a step towards
setting the VM in the same record as pruning and freezing in phase I of
vacuum. It isn't meant to be committed alone because it widens an
undesirable case where a heap buffer not marked dirty is stamped with an
LSN. If PD_ALL_VISIBLE is already set but the VM is not set, we won't
mark it dirty and then if checksums are enabled we will still stamp the
heap page LSN on a page not marked dirty.

Once the VM update is done in the same WAL record as pruning/freezing,
we will only set the LSN on the heap page if we set PD_ALL_VISIBLE or
made other heap page modifications.
---
 src/backend/access/heap/vacuumlazy.c | 283 ++++++++++++++-------------
 1 file changed, 146 insertions(+), 137 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 984d5879947..1cca095841e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1934,6 +1934,117 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
+
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneFreezeResult and all_visible_according_to_vm. This
+ * function does not actually set the VM bit or page-level hint,
+ * PD_ALL_VISIBLE.
+ *
+ * If it finds that the page-level visibility hint or VM is corrupted, it will
+ * fix them by clearing the VM bit and page hint. This does not need to be
+ * done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
+ * PD_ALL_VISIBLE should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+					   BlockNumber heap_blk,
+					   Buffer heap_buf,
+					   Buffer vmbuffer,
+					   bool all_visible_according_to_vm,
+					   const PruneFreezeResult *presult,
+					   uint8 *new_vmbits,
+					   bool *do_set_pd_vis)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+
+	*new_vmbits = 0;
+
+	/*
+	 * It should never be the case that the visibility map page is set while
+	 * the page-level bit is clear, but the reverse is allowed (if checksums
+	 * are not enabled).
+	 *
+	 * We avoid relying on all_visible_according_to_vm as a proxy for the
+	 * page-level PD_ALL_VISIBLE bit being set, since it might have become
+	 * stale.
+	 */
+	*do_set_pd_vis = presult->all_visible & !PageIsAllVisible(heap_page);
+
+	/*
+	 * Determine what to set the visibility map bits to based on information
+	 * from the VM (as of last heap_vac_scan_next_block() call), and from
+	 * all_visible and all_frozen variables.
+	 */
+	if ((presult->all_visible && !all_visible_according_to_vm) ||
+		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+		if (presult->all_frozen)
+		{
+			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+		}
+
+		return true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return false;
+}
+
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -1964,6 +2075,10 @@ lazy_scan_prune(LVRelState *vacrel,
 				bool *vm_page_frozen)
 {
 	Relation	rel = vacrel->rel;
+	bool		do_set_vm = false;
+	bool		do_set_pd_vis = false;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 	PruneFreezeResult presult;
 	PruneFreezeParams params = {
 		.relation = rel,
@@ -2075,28 +2190,22 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 	Assert(!presult.all_frozen || presult.all_visible);
 
-	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
-	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
+	do_set_vm = heap_page_will_set_vis(rel,
+									   blkno,
+									   buf,
+									   vmbuffer,
+									   all_visible_according_to_vm,
+									   &presult,
+									   &new_vmbits,
+									   &do_set_pd_vis);
 
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
 
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+	if (do_set_pd_vis)
+	{
 		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
 		 * NB: If the heap page is all-visible but the VM bit is not set, we
 		 * don't need to dirty the heap page.  However, if checksums are
 		 * enabled, we do need to make sure that the heap page is dirtied
@@ -2104,136 +2213,36 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * Given that this situation should only happen in rare cases after a
 		 * crash, it is not worth optimizing.
 		 */
-		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
+		PageSetAllVisible(page);
+	}
+
+	if (do_set_vm)
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
+									   new_vmbits);
 
 	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
+	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		vacrel->vm_new_visible_pages++;
+		if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
+	}
+	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	{
+		Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
 	return presult.ndeleted;
-- 
2.43.0



  [text/x-patch] v23-0003-Set-the-VM-in-prune-code.patch (26.0K, 4-v23-0003-Set-the-VM-in-prune-code.patch)
  download | inline diff:
From 40b506a888ef57f5b962b320b817b97e64c9c4c0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v23 03/14] Set the VM in prune code

For review only, this moves the code to set the VM into
heap_page_prune_and_freeze() as a step toward having it in the same WAL
record.
---
 src/backend/access/heap/pruneheap.c  | 281 ++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 166 +---------------
 src/include/access/heapam.h          |  27 +++
 3 files changed, 272 insertions(+), 202 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5af84b4c875..0daf3abf717 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -191,6 +194,14 @@ static void page_verify_redirects(Page page);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vis(Relation relation,
+								   BlockNumber heap_blk,
+								   Buffer heap_buf,
+								   Buffer vmbuffer,
+								   bool blk_known_av,
+								   const PruneFreezeResult *presult,
+								   uint8 *new_vmbits,
+								   bool *do_set_pd_vis);
 
 
 /*
@@ -280,6 +291,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeParams params = {
 				.relation = relation,
 				.buffer = buffer,
+				.vmbuffer = InvalidBuffer,
+				.blk_known_av = false,
 				.reason = PRUNE_ON_ACCESS,
 				.options = 0,
 				.vistest = vistest,
@@ -338,6 +351,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
 	prstate->cutoffs = params->cutoffs;
 
 	/*
@@ -386,51 +401,54 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
-	 * that after scanning the line pointers. We must correct all_visible and
-	 * all_frozen before we return them to the caller, so that the caller
-	 * doesn't set the VM bits incorrectly.
+	 * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+	 * them after deciding whether to freeze, but before updating the VM, to
+	 * avoid setting the VM bits incorrectly.
+	 *
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate->attempt_freeze)
 	{
 		prstate->all_visible = true;
 		prstate->all_frozen = true;
 	}
+	else if (prstate->attempt_update_vm)
+	{
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate->all_visible = false;
 		prstate->all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -765,10 +783,131 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneFreezeResult and blk_known_av. Some callers may
+ * already have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along as blk_known_av.
+ * Callers that have not previously checked the page's status in the VM should
+ * pass false for blk_known_av.
+ *
+ * This function does not actually set the VM bit or page-level hint,
+ * PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bit and page hint. This does
+ * not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
+ * PD_ALL_VISIBLE should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+					   BlockNumber heap_blk,
+					   Buffer heap_buf,
+					   Buffer vmbuffer,
+					   bool blk_known_av,
+					   const PruneFreezeResult *presult,
+					   uint8 *new_vmbits,
+					   bool *do_set_pd_vis)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+
+	*new_vmbits = 0;
+
+	/*
+	 * It should never be the case that the visibility map page is set while
+	 * the page-level bit is clear, but the reverse is allowed (if checksums
+	 * are not enabled).
+	 *
+	 * We avoid relying on blk_known_av as a proxy for the page-level
+	 * PD_ALL_VISIBLE bit being set, since it might have become stale and may
+	 * not be provided by all callers.
+	 */
+	*do_set_pd_vis = presult->all_visible & !PageIsAllVisible(heap_page);
+
+	/*
+	 * Determine what the visibility map bits should be set to using the
+	 * values of all_visible and all_frozen determined during
+	 * pruning/freezing.
+	 */
+	if ((presult->all_visible && !blk_known_av) ||
+		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+		if (presult->all_frozen)
+		{
+			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+		}
+
+		return true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 *
+	 * Callers which did not check the visibility map and determine
+	 * blk_known_av will not be eligible for this, however the cost of
+	 * potentially needing to read the visibility map for pages that are not
+	 * all-visible is too high to justify generalizing the check.
+	 */
+	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return false;
+}
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -783,12 +922,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -813,11 +953,15 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
+	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -1005,6 +1149,51 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
 		}
 	}
+
+	/* Now update the visibility map and PD_ALL_VISIBLE hint */
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = false;
+	if (prstate.attempt_update_vm)
+		do_set_vm = heap_page_will_set_vis(params->relation,
+										   blockno,
+										   buffer,
+										   vmbuffer,
+										   params->blk_known_av,
+										   presult,
+										   &presult->new_vmbits,
+										   &do_set_pd_vis);
+
+
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || presult->new_vmbits == 0);
+
+	if (do_set_pd_vis)
+	{
+		/*
+		 * NB: If the heap page is all-visible but the VM bit is not set, we
+		 * don't need to dirty the heap page.  However, if checksums are
+		 * enabled, we do need to make sure that the heap page is dirtied
+		 * before passing it to visibilitymap_set(), because it may be logged.
+		 * Given that this situation should only happen in rare cases after a
+		 * crash, it is not worth optimizing.
+		 */
+		MarkBufferDirty(buffer);
+		PageSetAllVisible(page);
+	}
+
+	presult->old_vmbits = 0;
+	if (do_set_vm)
+		presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
+												InvalidXLogRecPtr,
+												vmbuffer, presult->vm_conflict_horizon,
+												presult->new_vmbits);
 }
 
 
@@ -1479,6 +1668,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			{
 				TransactionId xmin;
 
+				Assert(prstate->attempt_update_vm);
+
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1cca095841e..f5617335cb2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1935,116 +1935,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 }
 
 
-/*
- * Decide whether to set the visibility map bits for heap_blk, using
- * information from PruneFreezeResult and all_visible_according_to_vm. This
- * function does not actually set the VM bit or page-level hint,
- * PD_ALL_VISIBLE.
- *
- * If it finds that the page-level visibility hint or VM is corrupted, it will
- * fix them by clearing the VM bit and page hint. This does not need to be
- * done in a critical section.
- *
- * Returns true if one or both VM bits should be set, along with the desired
- * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
- * PD_ALL_VISIBLE should be set on the heap page.
- */
-static bool
-heap_page_will_set_vis(Relation relation,
-					   BlockNumber heap_blk,
-					   Buffer heap_buf,
-					   Buffer vmbuffer,
-					   bool all_visible_according_to_vm,
-					   const PruneFreezeResult *presult,
-					   uint8 *new_vmbits,
-					   bool *do_set_pd_vis)
-{
-	Page		heap_page = BufferGetPage(heap_buf);
-
-	*new_vmbits = 0;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear, but the reverse is allowed (if checksums
-	 * are not enabled).
-	 *
-	 * We avoid relying on all_visible_according_to_vm as a proxy for the
-	 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-	 * stale.
-	 */
-	*do_set_pd_vis = presult->all_visible & !PageIsAllVisible(heap_page);
-
-	/*
-	 * Determine what to set the visibility map bits to based on information
-	 * from the VM (as of last heap_vac_scan_next_block() call), and from
-	 * all_visible and all_frozen variables.
-	 */
-	if ((presult->all_visible && !all_visible_according_to_vm) ||
-		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
-	{
-		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-		if (presult->all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		return true;
-	}
-
-	/*
-	 * Now handle two potential corruption cases:
-	 *
-	 * These do not need to happen in a critical section and are not
-	 * WAL-logged.
-	 *
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
-			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(relation), heap_blk)));
-
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(relation), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buf);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	return false;
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2075,16 +1965,14 @@ lazy_scan_prune(LVRelState *vacrel,
 				bool *vm_page_frozen)
 {
 	Relation	rel = vacrel->rel;
-	bool		do_set_vm = false;
-	bool		do_set_pd_vis = false;
-	uint8		new_vmbits = 0;
-	uint8		old_vmbits = 0;
 	PruneFreezeResult presult;
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
+		.blk_known_av = all_visible_according_to_vm,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
@@ -2187,60 +2075,24 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
-	do_set_vm = heap_page_will_set_vis(rel,
-									   blkno,
-									   buf,
-									   vmbuffer,
-									   all_visible_according_to_vm,
-									   &presult,
-									   &new_vmbits,
-									   &do_set_pd_vis);
-
-
-	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
-	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
-
-	if (do_set_pd_vis)
-	{
-		/*
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		MarkBufferDirty(buf);
-		PageSetAllVisible(page);
-	}
-
-	if (do_set_vm)
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   new_vmbits);
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
 	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
-		(new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
 		vacrel->vm_new_visible_pages++;
-		if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
 			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
 		vacrel->vm_new_frozen_pages++;
 		*vm_page_frozen = true;
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..ce9cfbdc767 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,18 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * vmbuffer is the buffer that must already contain the required block of
+	 * the visibility map if we are to update it. blk_known_av is the
+	 * visibility status of the heap block as of the last call to
+	 * find_next_unskippable_block(). Callers which did not check the
+	 * visibility map already should pass false for blk_known_av. This is only
+	 * an optimization for callers that did check the VM and won't affect
+	 * correctness.
+	 */
+	Buffer		vmbuffer;
+	bool		blk_known_av;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -252,6 +265,9 @@ typedef struct PruneFreezeParams
 	 *
 	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
 	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 *
+	 * HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status
+	 * in the VM.
 	 */
 	int			options;
 
@@ -299,6 +315,17 @@ typedef struct PruneFreezeResult
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
+	 *
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+	 * we have attempted to update the VM.
+	 */
+	uint8		new_vmbits;
+	uint8		old_vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v23-0004-Move-VM-assert-into-prune-freeze-code.patch (14.8K, 5-v23-0004-Move-VM-assert-into-prune-freeze-code.patch)
  download | inline diff:
From 9aa0ec2b5fae04762128fbec329a23139fb5b4a4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v23 04/14] Move VM assert into prune/freeze code

For review only, this commit moves the check of the heap page into
prune/freeze code before setting the VM. This allows us to remove some
fields of the PruneFreezeResult.

This will get squashed into a larger commit to set the VM in the same
record where we prune and freeze.
---
 src/backend/access/heap/pruneheap.c  | 142 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c |  68 +------------
 src/include/access/heapam.h          |  25 ++---
 3 files changed, 111 insertions(+), 124 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0daf3abf717..2512b5d83e3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -199,7 +199,7 @@ static bool heap_page_will_set_vis(Relation relation,
 								   Buffer heap_buf,
 								   Buffer vmbuffer,
 								   bool blk_known_av,
-								   const PruneFreezeResult *presult,
+								   const PruneState *prstate,
 								   uint8 *new_vmbits,
 								   bool *do_set_pd_vis);
 
@@ -785,8 +785,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 
 /*
  * Decide whether to set the visibility map bits for heap_blk, using
- * information from PruneFreezeResult and blk_known_av. Some callers may
- * already have examined this page’s VM bits (e.g., VACUUM in the previous
+ * information from PruneState and blk_known_av. Some callers may already have
+ * examined this page’s VM bits (e.g., VACUUM in the previous
  * heap_vac_scan_next_block() call) and can pass that along as blk_known_av.
  * Callers that have not previously checked the page's status in the VM should
  * pass false for blk_known_av.
@@ -808,13 +808,20 @@ heap_page_will_set_vis(Relation relation,
 					   Buffer heap_buf,
 					   Buffer vmbuffer,
 					   bool blk_known_av,
-					   const PruneFreezeResult *presult,
+					   const PruneState *prstate,
 					   uint8 *new_vmbits,
 					   bool *do_set_pd_vis)
 {
 	Page		heap_page = BufferGetPage(heap_buf);
 
 	*new_vmbits = 0;
+	*do_set_pd_vis = false;
+
+	if (!prstate->attempt_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		return false;
+	}
 
 	/*
 	 * It should never be the case that the visibility map page is set while
@@ -825,22 +832,19 @@ heap_page_will_set_vis(Relation relation,
 	 * PD_ALL_VISIBLE bit being set, since it might have become stale and may
 	 * not be provided by all callers.
 	 */
-	*do_set_pd_vis = presult->all_visible & !PageIsAllVisible(heap_page);
+	*do_set_pd_vis = prstate->all_visible & !PageIsAllVisible(heap_page);
 
 	/*
 	 * Determine what the visibility map bits should be set to using the
 	 * values of all_visible and all_frozen determined during
 	 * pruning/freezing.
 	 */
-	if ((presult->all_visible && !blk_known_av) ||
-		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	if ((prstate->all_visible && !blk_known_av) ||
+		(prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
 	{
 		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-		if (presult->all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+		if (prstate->all_frozen)
 			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-		}
 
 		return true;
 	}
@@ -887,7 +891,7 @@ heap_page_will_set_vis(Relation relation,
 	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
 	 * however.
 	 */
-	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
 	{
 		ereport(WARNING,
 				(errcode(ERRCODE_DATA_CORRUPTED),
@@ -903,6 +907,30 @@ heap_page_will_set_vis(Relation relation,
 	return false;
 }
 
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+#endif
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -956,6 +984,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -1116,23 +1145,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
@@ -1150,20 +1164,68 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		}
 	}
 
-	/* Now update the visibility map and PD_ALL_VISIBLE hint */
-	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+	/*
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so we don't need to again.
+	 */
+	if (prstate.all_frozen)
+		vm_conflict_horizon = InvalidTransactionId;
+	else
+		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(presult->lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(params->relation, buffer,
+										prstate.cutoffs->OldestXmin,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
 
-	do_set_vm = false;
-	if (prstate.attempt_update_vm)
-		do_set_vm = heap_page_will_set_vis(params->relation,
-										   blockno,
-										   buffer,
-										   vmbuffer,
-										   params->blk_known_av,
-										   presult,
-										   &presult->new_vmbits,
-										   &do_set_pd_vis);
+		Assert(prstate.all_frozen == debug_all_frozen);
 
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == vm_conflict_horizon);
+	}
+#endif
+
+	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+	 * based on information from the VM and the all_visible/all_frozen flags.
+	 *
+	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+	 * VM bit is clear, we strongly prefer to keep them in sync.
+	 *
+	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+	 * already been set. Setting only the VM is most common when setting an
+	 * already all-visible page all-frozen.
+	 */
+	do_set_vm = heap_page_will_set_vis(params->relation,
+									   blockno,
+									   buffer,
+									   vmbuffer,
+									   params->blk_known_av,
+									   &prstate,
+									   &presult->new_vmbits,
+									   &do_set_pd_vis);
 
 	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
 	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
@@ -1192,7 +1254,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (do_set_vm)
 		presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
 												InvalidXLogRecPtr,
-												vmbuffer, presult->vm_conflict_horizon,
+												vmbuffer, vm_conflict_horizon,
 												presult->new_vmbits);
 }
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f5617335cb2..4aa425ec945 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,20 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
-										   OffsetNumber *deadoffsets,
-										   int ndeadoffsets,
-										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
-										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2016,32 +2002,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -3496,29 +3456,6 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
-{
-
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
-										  NULL, 0,
-										  all_frozen,
-										  visibility_cutoff_xid,
-										  logging_offnum);
-}
-#endif
 
 /*
  * Check whether the heap page in buf is all-visible except for the dead
@@ -3542,15 +3479,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
  * This logic is closely related to heap_prune_record_unchanged_lp_normal().
  * If you modify this function, ensure consistency with that code. An
  * assertion cross-checks that both remain in agreement. Do not introduce new
  * side-effects.
  */
-static bool
+bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ce9cfbdc767..b20096b6ca1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -263,8 +263,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
 	 *
 	 * HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status
 	 * in the VM.
@@ -300,21 +299,6 @@ typedef struct PruneFreezeResult
 	int			live_tuples;
 	int			recently_dead_tuples;
 
-	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
-	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
-
 	/*
 	 * old_vmbits are the state of the all-visible and all-frozen bits in the
 	 * visibility map before updating it during phase I of vacuuming.
@@ -460,6 +444,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v23-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (21.4K, 6-v23-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From a407673cb2632d4544cc56458dbf4a063da2067c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v23 05/14] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

NOTE: This commit is the main commit and all review-only commits
preceding it will be squashed into it.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c |  48 +++--
 src/backend/access/heap/pruneheap.c   | 294 +++++++++++++++-----------
 src/backend/access/heap/vacuumlazy.c  |   1 +
 src/include/access/heapam.h           |   1 +
 4 files changed, 212 insertions(+), 132 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11cb3f74da5..b1ceab71928 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -104,6 +104,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		OffsetNumber *frz_offsets;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 		bool		do_prune;
+		bool		set_lsn = false;
+		bool		mark_buffer_dirty = false;
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
 											   &nplans, &plans, &frz_offsets,
@@ -157,17 +159,39 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
-		if (vmflags & VISIBILITYMAP_VALID_BITS)
-			PageSetAllVisible(page);
-
-		MarkBufferDirty(buffer);
+		if (do_prune || nplans > 0)
+			mark_buffer_dirty = set_lsn = true;
 
 		/*
-		 * See log_heap_prune_and_freeze() for commentary on when we set the
-		 * heap page LSN.
+		 * The critical integrity requirement here is that we must never end
+		 * up with the visibility map bit set and the page-level
+		 * PD_ALL_VISIBLE bit unset.  If that were to occur, a subsequent page
+		 * modification would fail to clear the visibility map bit.
+		 *
+		 * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
+		 * marking an all-visible page all-frozen). If only the VM is updated,
+		 * the heap page need not be dirtied.
 		 */
-		if (do_prune || nplans > 0 ||
-			((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			mark_buffer_dirty = true;
+
+			/*
+			 * See log_heap_prune_and_freeze() for commentary on when we set
+			 * the heap page LSN.
+			 */
+			if (XLogHintBitIsNeeded())
+				set_lsn = true;
+		}
+
+		/* We should always mark a buffer dirty before stamping with an LSN */
+		Assert(!set_lsn || mark_buffer_dirty);
+
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+
+		if (set_lsn)
 			PageSetLSN(page, lsn);
 
 		/*
@@ -246,10 +270,10 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 /*
  * Replay XLOG_HEAP2_VISIBLE records.
  *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
+ * The critical integrity requirement here is that we must never end up with a
+ * situation where the visibility map bit is set, and the page-level
+ * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent page
+ * modification would fail to clear the visibility map bit.
  */
 static void
 heap_xlog_visible(XLogReaderState *record)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2512b5d83e3..b851d723c74 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -194,6 +194,12 @@ static void page_verify_redirects(Page page);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  uint8 new_vmbits,
+									  TransactionId latest_xid_removed,
+									  TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid,
+									  bool blk_already_av);
 static bool heap_page_will_set_vis(Relation relation,
 								   BlockNumber heap_blk,
 								   Buffer heap_buf,
@@ -783,6 +789,64 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm, uint8 new_vmbits,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid, bool blk_already_av)
+{
+	TransactionId conflict_xid;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune && !do_freeze &&
+		do_set_vm && blk_already_av && (new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	return conflict_xid;
+}
+
 /*
  * Decide whether to set the visibility map bits for heap_blk, using
  * information from PruneState and blk_known_av. Some callers may already have
@@ -984,7 +1048,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -993,6 +1056,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1058,6 +1124,39 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+	 * based on information from the VM and the all_visible/all_frozen flags.
+	 *
+	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+	 * VM bit is clear, we strongly prefer to keep them in sync.
+	 *
+	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+	 * already been set. Setting only the VM is most common when setting an
+	 * already all-visible page all-frozen.
+	 */
+	do_set_vm = heap_page_will_set_vis(params->relation,
+									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   &prstate, &new_vmbits, &do_set_pd_vis);
+
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm, new_vmbits,
+									prstate.latest_xid_removed, prstate.frz_conflict_horizon,
+									prstate.visibility_cutoff_xid, params->blk_known_av);
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1079,14 +1178,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_pd_vis)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1100,36 +1202,33 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (do_set_pd_vis)
+			PageSetAllVisible(page);
+
+		if (do_prune || do_freeze || do_set_pd_vis)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			Assert(PageIsAllVisible(page));
+			old_vmbits = visibilitymap_set_vmbits(blockno,
+												  vmbuffer, new_vmbits,
+												  params->relation->rd_locator);
+			Assert(old_vmbits != new_vmbits);
+		}
 
 		/*
 		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
 		 */
 		if (RelationNeedsWAL(params->relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  do_set_pd_vis,
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1139,43 +1238,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->hastup = prstate.hastup;
-
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
-	if (prstate.attempt_freeze)
-	{
-		if (presult->nfrozen > 0)
-		{
-			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
-		}
-		else
-		{
-			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
-		}
-	}
-
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so we don't need to again.
-	 */
-	if (prstate.all_frozen)
-		vm_conflict_horizon = InvalidTransactionId;
-	else
-		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 	/*
 	 * During its second pass over the heap, VACUUM calls
@@ -1190,7 +1254,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(presult->lpdead_items == 0);
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
 										prstate.cutoffs->OldestXmin,
@@ -1200,62 +1265,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		Assert(prstate.all_frozen == debug_all_frozen);
 
 		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == vm_conflict_horizon);
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
 #endif
 
-	Assert(!prstate.all_frozen || prstate.all_visible);
-	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
-	/*
-	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
-	 * based on information from the VM and the all_visible/all_frozen flags.
-	 *
-	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
-	 * VM bit is clear, we strongly prefer to keep them in sync.
-	 *
-	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
-	 * already been set. Setting only the VM is most common when setting an
-	 * already all-visible page all-frozen.
-	 */
-	do_set_vm = heap_page_will_set_vis(params->relation,
-									   blockno,
-									   buffer,
-									   vmbuffer,
-									   params->blk_known_av,
-									   &prstate,
-									   &presult->new_vmbits,
-									   &do_set_pd_vis);
-
-	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
-	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->hastup = prstate.hastup;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
-	/*
-	 * new_vmbits should be 0 regardless of whether or not the page is
-	 * all-visible if we do not intend to set the VM.
-	 */
-	Assert(do_set_vm || presult->new_vmbits == 0);
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
 
-	if (do_set_pd_vis)
+	if (prstate.attempt_freeze)
 	{
-		/*
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		MarkBufferDirty(buffer);
-		PageSetAllVisible(page);
+		if (presult->nfrozen > 0)
+		{
+			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+		}
+		else
+		{
+			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+		}
 	}
-
-	presult->old_vmbits = 0;
-	if (do_set_vm)
-		presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
-												InvalidXLogRecPtr,
-												vmbuffer, vm_conflict_horizon,
-												presult->new_vmbits);
 }
 
 
@@ -2387,14 +2426,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
+ *
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ *   all-visible and all-frozen.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * These changes all happen together, so we use a single WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2406,6 +2449,15 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
+ * In some cases, such as when heap_page_prune_and_freeze() is setting an
+ * already marked all-visible page all-frozen, PD_ALL_VISIBLE may already be
+ * set. So, it is possible for vmflags to be non-zero and set_pd_all_vis to be
+ * false.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
@@ -2415,6 +2467,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 						  Buffer vmbuffer, uint8 vmflags,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
+						  bool set_pd_all_vis,
 						  PruneReason reason,
 						  HeapTupleFreeze *frozen, int nfrozen,
 						  OffsetNumber *redirected, int nredirected,
@@ -2451,7 +2504,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 */
 	if (!do_prune &&
 		nfrozen == 0 &&
-		(!do_set_vm || !XLogHintBitIsNeeded()))
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags_heap |= REGBUF_NO_IMAGE;
 
 	/*
@@ -2569,7 +2622,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * See comment at the top of the function about regbuf_flags_heap for
 	 * details on when we can advance the page LSN.
 	 */
-	if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
 	{
 		Assert(BufferIsDirty(buffer));
 		PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4aa425ec945..0d39d57115d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2776,6 +2776,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  vmflags,
 								  conflict_xid,
 								  false,	/* no cleanup lock required */
+								  (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b20096b6ca1..14c1d92604d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -435,6 +435,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 									  Buffer vmbuffer, uint8 vmflags,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
+									  bool set_pd_all_vis,
 									  PruneReason reason,
 									  HeapTupleFreeze *frozen, int nfrozen,
 									  OffsetNumber *redirected, int nredirected,
-- 
2.43.0



  [text/x-patch] v23-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 7-v23-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From fd1518bb82741f8b0e554206c2e35a64bf12fbc3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v23 06/14] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0d39d57115d..d03442abcc1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1872,9 +1872,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1891,13 +1894,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  true, /* set_pd_all_vis */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v23-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.4K, 8-v23-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From faf079025c6354fb2fcb0695da29118c476ae4dd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v23 07/14] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |   6 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 112 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  28 +---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 46 insertions(+), 375 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4d382a04338..3ad78ba4694 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8812,50 +8812,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index b1ceab71928..f0de2c136a0 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -254,7 +254,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -267,142 +267,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with a
- * situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent page
- * modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -780,8 +644,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -793,11 +657,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1378,9 +1242,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b851d723c74..7a778ad3bad 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1211,9 +1211,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_set_vm)
 		{
 			Assert(PageIsAllVisible(page));
-			old_vmbits = visibilitymap_set_vmbits(blockno,
-												  vmbuffer, new_vmbits,
-												  params->relation->rd_locator);
+			old_vmbits = visibilitymap_set(blockno,
+										   vmbuffer, new_vmbits,
+										   params->relation->rd_locator);
 			Assert(old_vmbits != new_vmbits);
 		}
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d03442abcc1..5d88a1592e3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1894,11 +1894,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2781,9 +2781,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 * set PD_ALL_VISIBLE.
 		 */
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..69678187832 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBILITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cf3f6a7dafd..a139705de01 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4302,7 +4302,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v23-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (8.1K, 9-v23-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From 221a5db2d8a7cee07053c30f635be0b27bae2242 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v23 08/14] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.

The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 12 ++++++------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 17 ++++++++---------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7a778ad3bad..1a3c7cf1ef5 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -253,7 +253,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -487,7 +487,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
+	 * checked item causes GlobalVisFullXidVisibleToAll() to update the
 	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
 	 * transaction aborts.
 	 *
@@ -1327,11 +1327,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1790,7 +1790,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				/*
 				 * For now always use prstate->cutoffs for this test, because
 				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisTestIsRemovableXid
+				 * is requested. We could use GlobalVisXidVisibleToAll()
 				 * instead, if a non-freezing caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 71ef2e5036f..1c0eb425ee9 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 }
 
 /*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
  *
  * It is crucial that this only gets called for xids from a source that
  * protects against xid wraparounds (e.g. from a table and thus protected by
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
  */
 bool
 GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v23-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (10.6K, 10-v23-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From c75f7d1281fadf5c49e37577ef42ff96b92b3f59 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v23 09/14] Use GlobalVisState in vacuum to determine page
 level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++
 src/backend/access/heap/pruneheap.c         | 43 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 10 ++---
 src/include/access/heapam.h                 | 11 +++---
 4 files changed, 58 insertions(+), 34 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1a3c7cf1ef5..d836bbeaf52 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -450,11 +450,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -980,14 +981,13 @@ heap_page_will_set_vis(Relation relation,
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -1078,6 +1078,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prune_freeze_plan(RelationGetRelid(params->relation),
 					  buffer, &prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1255,10 +1265,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
-										prstate.cutoffs->OldestXmin,
+										prstate.vistest,
 										&debug_all_frozen,
 										&debug_cutoff, off_loc));
 
@@ -1787,20 +1796,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisXidVisibleToAll()
-				 * instead, if a non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5d88a1592e3..c0e1350cb11 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2735,7 +2735,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3495,7 +3495,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3511,7 +3511,7 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3585,7 +3585,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 		/* Visibility checks may do IO or allocate memory */
 		Assert(CritSectionCount == 0);
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3604,7 +3604,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 14c1d92604d..4702ec00dea 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -278,10 +278,9 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Contains the cutoffs used for freezing. They are required if the
-	 * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
-	 * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
-	 * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
-	 * calculates them once, at the beginning of vacuuming the relation.
+	 * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+	 * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+	 * relation.
 	 */
 	struct VacuumCutoffs *cutoffs;
 } PruneFreezeParams;
@@ -446,7 +445,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -460,6 +459,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v23-0010-Unset-all_visible-sooner-if-not-freezing.patch (2.5K, 11-v23-0010-Unset-all_visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 23236fd69abba5d481ff228d0fe1486ba40eddf3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v23 10/14] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d836bbeaf52..03b8ddcc38d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1653,8 +1653,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	/*
 	 * Deliberately delay unsetting all_visible and all_frozen until later
 	 * during pruning. Removable dead tuples shouldn't preclude freezing the
-	 * page.
+	 * page. If we won't attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1913,8 +1918,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible and all_frozen until later, at the
 	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
 	 * freeze the page after pruning.  As long as we unset it before updating
-	 * the visibility map, this will be correct.
+	 * the visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v23-0011-Track-which-relations-are-modified-by-a-query.patch (2.5K, 12-v23-0011-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From be0f2805786ba0d4711c39fe4896a3d6f51feba1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v23 11/14] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
---
 src/backend/executor/execMain.c  | 4 ++++
 src/backend/executor/execUtils.c | 2 ++
 src/include/nodes/execnodes.h    | 6 ++++++
 3 files changed, 12 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..0630a5af79e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 64ff6996431..7f6522cea8e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v23-0012-Pass-down-information-on-table-modification-to-s.patch (23.0K, 13-v23-0012-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 804296ddbb6b3553d37492d2f79d034df71fd3e5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v23 12/14] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  7 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 +++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  4 ++--
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++----
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      | 11 ++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              | 19 ++++++++++-------
 23 files changed, 91 insertions(+), 44 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index cb3331921cb..b9613787b85 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index f87c60a230c..645688f9241 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..d7fac94826d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0(sizeof(IndexFetchHeapData));
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index c96917085c2..9d425504e1b 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
 	}
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 454adaee7dc..02ab0233e59 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 1e099febdc8..db2a302a486 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..22b453dc617 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 07e5b95782e..58dbbf4d851 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6345,7 +6345,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13730,7 +13730,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 47d5047fe8b..055759cd343 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index dd323c9b9fd..b41bfeca244 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index def32774c90..473d236e551 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f464cca9507..87b04b1b88e 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f36929deec3..90f929ce741 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 8ba038c5ef4..d3b340ee2a7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3370,7 +3370,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 540aa9628d7..28434146eba 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4702ec00dea..fc2c8314e97 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
 
 	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
 	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d10b1b03cdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the rel */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1164,9 +1167,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v23-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch (13.9K, 14-v23-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 422613795deaaef2bdd43cd0767e019cbdd44f50 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v23 13/14] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 15 +++-
 src/backend/access/heap/heapam_handler.c      | 15 +++-
 src/backend/access/heap/pruneheap.c           | 78 ++++++++++++++++---
 src/include/access/heapam.h                   | 24 +++++-
 .../t/035_standby_logical_decoding.pl         |  3 +-
 5 files changed, 116 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3ad78ba4694..ecc04390ac7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d7fac94826d..27e3498f5f4 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2472,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2518,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 03b8ddcc38d..04f10054402 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,7 +205,9 @@ static bool heap_page_will_set_vis(Relation relation,
 								   Buffer heap_buf,
 								   Buffer vmbuffer,
 								   bool blk_known_av,
-								   const PruneState *prstate,
+								   PruneReason reason,
+								   bool do_prune, bool do_freeze,
+								   PruneState *prstate,
 								   uint8 *new_vmbits,
 								   bool *do_set_pd_vis);
 
@@ -221,9 +223,13 @@ static bool heap_page_will_set_vis(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -305,6 +311,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 				.cutoffs = NULL,
 			};
 
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options |= HEAP_PAGE_PRUNE_UPDATE_VIS;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
 
@@ -863,6 +876,9 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm, uint8 new_vmbits
  * corrupted, it will fix them by clearing the VM bit and page hint. This does
  * not need to be done in a critical section.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with the desired
  * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
  * PD_ALL_VISIBLE should be set on the heap page.
@@ -873,7 +889,9 @@ heap_page_will_set_vis(Relation relation,
 					   Buffer heap_buf,
 					   Buffer vmbuffer,
 					   bool blk_known_av,
-					   const PruneState *prstate,
+					   PruneReason reason,
+					   bool do_prune, bool do_freeze,
+					   PruneState *prstate,
 					   uint8 *new_vmbits,
 					   bool *do_set_pd_vis)
 {
@@ -888,6 +906,24 @@ heap_page_will_set_vis(Relation relation,
 		return false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+		return false;
+	}
+
 	/*
 	 * It should never be the case that the visibility map page is set while
 	 * the page-level bit is clear, but the reverse is allowed (if checksums
@@ -921,14 +957,15 @@ heap_page_will_set_vis(Relation relation,
 	 * WAL-logged.
 	 *
 	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
+	 * page-level bit is clear.  However, it's possible that in vacuum the bit
+	 * got cleared after heap_vac_scan_next_block() was called, so we must
+	 * recheck with buffer lock before concluding that the VM is corrupt.
 	 *
 	 * Callers which did not check the visibility map and determine
 	 * blk_known_av will not be eligible for this, however the cost of
 	 * potentially needing to read the visibility map for pages that are not
-	 * all-visible is too high to justify generalizing the check.
+	 * all-visible is too high to justify generalizing the check. A future
+	 * vacuum will have to take care of fixing the corruption.
 	 */
 	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
 			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -1149,6 +1186,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	do_set_vm = heap_page_will_set_vis(params->relation,
 									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   params->reason, do_prune, do_freeze,
 									   &prstate, &new_vmbits, &do_set_pd_vis);
 
 	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -1224,13 +1262,29 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			old_vmbits = visibilitymap_set(blockno,
 										   vmbuffer, new_vmbits,
 										   params->relation->rd_locator);
-			Assert(old_vmbits != new_vmbits);
+
+			/*
+			 * If on-access pruning set the VM in between when vacuum first
+			 * checked the visibility map and determined blk_known_av and when
+			 * we actually prune the page, we could end up trying to set the
+			 * VM only to find it is already set.
+			 */
+			if (old_vmbits == new_vmbits)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				/* Unset so we don't emit WAL since no change occured */
+				do_set_vm = false;
+			}
 		}
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
+		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+		 * only planning to update the VM, and it turns out that it was
+		 * already set, there is no need to emit WAL. As such, we must check
+		 * that some change is required again.
 		 */
-		if (RelationNeedsWAL(params->relation))
+		if (RelationNeedsWAL(params->relation) &&
+			(do_prune || do_freeze || do_set_vm))
 		{
 			log_heap_prune_and_freeze(params->relation, buffer,
 									  do_set_vm ? vmbuffer : InvalidBuffer,
@@ -2440,8 +2494,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
  * all.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index fc2c8314e97..3f2b5eedfff 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 
 	/*
 	 * Some optimizations can only be performed if the query does not modify
@@ -425,7 +442,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v23-0014-Set-pd_prune_xid-on-insert.patch (6.7K, 15-v23-0014-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 59d58156716426668022402d030cf8de7fcac928 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v23 14/14] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.

ci-os-only:
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../modules/index/expected/killtuples.out     |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ecc04390ac7..d5f3f897dd3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index f0de2c136a0..cf62a8df67c 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -465,6 +465,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -614,9 +620,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-03 23:08  Melanie Plageman <[email protected]>
  parent: Chao Li <[email protected]>
  1 sibling, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-12-03 23:08 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Nov 24, 2025 at 3:08 AM Chao Li <[email protected]> wrote:
>
> > On Nov 21, 2025, at 09:09, Chao Li <[email protected]> wrote:
> >
> > I’d stop here today, and continue reviewing rest commits in next week.
>
> I continue reviewing today.

I incorporated all your feedback in my recently posted v23. Thanks for
the review!

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-04 05:10  Chao Li <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Chao Li @ 2025-12-04 05:10 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hi Melanie,

I resisted this patch again today. I reviewed 0001-0004, and got a few more comments:

> On Dec 4, 2025, at 07:07, Melanie Plageman <[email protected]> wrote:
> 
> <v23-0001-Simplify-vacuum-visibility-assertion.patch><v23-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patch><v23-0003-Set-the-VM-in-prune-code.patch><v23-0004-Move-VM-assert-into-prune-freeze-code.patch><v23-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v23-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v23-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v23-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch><v23-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v23-0010-Unset-all_visible-sooner-if-not-freezing.patch><v23-0011-Track-which-relations-are-modified-by-a-query.patch><v23-0012-Pass-down-information-on-table-modification-to-s.patch><v23-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch><v23-0014-Set-pd_prune_xid-on-insert.patch>

1 - 0002
```
+static bool
+heap_page_will_set_vis(Relation relation,
+					   BlockNumber heap_blk,
+					   Buffer heap_buf,
+					   Buffer vmbuffer,
+					   bool all_visible_according_to_vm,
+					   const PruneFreezeResult *presult,
+					   uint8 *new_vmbits,
+					   bool *do_set_pd_vis)
```

Actually, I wanted to comment on the new function name in last round of review, but I guess I missed that.

I was very confused what “set_vis” means, and finally figured out “vis” should stand for “visibility”. Here “vis” actually means “visibility map bits”. There is the other “vis” in the last parameter’s name “do_set_pd_vis” where the “vis” should be mean “PD_ALL_VISIBLE” bit. So the two “vis” feels making things confusing.

How about rename the function to “heap_page_will_set_vm_bits”, and rename the last parameter to “do_set_all_visible”? 

2 - 0002
```
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneFreezeResult and all_visible_according_to_vm. This
+ * function does not actually set the VM bit or page-level hint,
+ * PD_ALL_VISIBLE.
+ *
+ * If it finds that the page-level visibility hint or VM is corrupted, it will
+ * fix them by clearing the VM bit and page hint. This does not need to be
+ * done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
+ * PD_ALL_VISIBLE should be set on the heap page.
+ */
```

This function comment mentions PD_ALL_VISIBLE twice, but never mentions ALL_FROZEN. So “Returns true if one or both VM bits should be set” fells unclear. How about rephrase like "Returns true if the all-visible and/or all-frozen VM bits should be set.”

3 - 0002
```
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
```

Here in the comment and error message, I guess “visibility map bit” refers to “all visible bit”, can we be explicit?

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-09 17:48  Melanie Plageman <[email protected]>
  parent: Chao Li <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-12-09 17:48 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Thu, Dec 4, 2025 at 12:11 AM Chao Li <[email protected]> wrote:
>
> I resisted this patch again today. I reviewed 0001-0004, and got a few more comments:

Thanks for the review! v24 attached with updates you suggested as well
as the bug fix described below.

I realized my code didn't mark the heap buffer dirty if we were not
modifying it (i.e. only setting the VM). This trips an assert in
XLogRegisterBuffer() which requires that all buffers registered with
the WAL machinery are marked dirty unless REGBUF_NO_CHANGE is passed.

It wasn't possible to hit it in master because we unconditionally
dirtied the buffer if we found the VM not set in
find_next_unskippable_block() -- even if we made no changes to the
heap buffer. But my refactoring only dirtied the heap buffer if we
modified it (either pruning/freezing or setting PD_ALL_VISIBLE).

In attached v24, I once again always dirty the heap buffer before
registering it. We can't skip adding the heap buffer to the WAL chain
even if we didn't modify it, because we use it to update the freespace
map during recovery. We could pass REGBUF_NO_CHANGE when the heap
buffer is completely unmodified. But the delicate special case logic
doesn't seem worth the effort to maintain, as the only time the heap
buffer should be unmodified is when the VM has been truncated or
removed. I added a test to the commit doing this refactoring that
would have caught my mistake (0003).

I also split the refactoring of the VM setting logic into more commits
to help make it clearer (0003-0004). We could technically commit the
refactoring commits to master. I had not originally intended to do so
since they do not have independent value beyond clarity for the
reviewer.

In this set 0001 and 0002 are independent. 0003-0007 are all small
steps toward the single change in 0007 which combines the VM updates
into the same WAL record as pruning and freezing. 0008 and 0009 are
removing the rest of XLOG_HEAP2_VISIBLE. 0010 - 0012 are refactoring
needed to set the VM during on-access pruning. 0013 - 0015 are small
steps toward setting the VM on-access. And 0016 sets the prune xid on
insert so we may set the VM on-access for pages that have only new
data.

> +static bool
> +heap_page_will_set_vis(Relation relation,
>
> Actually, I wanted to comment on the new function name in last round of review, but I guess I missed that.
>
> I was very confused what “set_vis” means, and finally figured out “vis” should stand for “visibility”. Here “vis” actually means “visibility map bits”. There is the other “vis” in the last parameter’s name “do_set_pd_vis” where the “vis” should be mean “PD_ALL_VISIBLE” bit. So the two “vis” feels making things confusing.
>
> How about rename the function to “heap_page_will_set_vm_bits”, and rename the last parameter to “do_set_all_visible”?

I named it that way because it was responsible for telling us what we
should set the VM to _and_ if we should set PD_ALL_VISIBLE. However,
once I corrected the bug mentioned above, we always set PD_ALL_VISIBLE
if setting the VM, so I was able to remove this ambiguity. As such
I've renamed the function heap_page_will_set_vm() (and removed the
last parameter).

> + * Decide whether to set the visibility map bits for heap_blk, using
> + * information from PruneFreezeResult and all_visible_according_to_vm. This
> + * function does not actually set the VM bit or page-level hint,
> + * PD_ALL_VISIBLE.
> + *
> + * If it finds that the page-level visibility hint or VM is corrupted, it will
> + * fix them by clearing the VM bit and page hint. This does not need to be
> + * done in a critical section.
> + *
> + * Returns true if one or both VM bits should be set, along with the desired
> + * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
> + * PD_ALL_VISIBLE should be set on the heap page.
> + */
> ```
>
> This function comment mentions PD_ALL_VISIBLE twice, but never mentions ALL_FROZEN. So “Returns true if one or both VM bits should be set” fells unclear. How about rephrase like "Returns true if the all-visible and/or all-frozen VM bits should be set.”

PD_ALL_VISIBLE is the page-level visibility hint (not the VM bit) and
there is no page level frozen hint. It doesn't mention that the VM
bits are all-visible and all-frozen, though, so I have modified the
comment a bit to make sure the all-frozen bit of the VM is mentioned.

> +        * Now handle two potential corruption cases:
> +        *
> +        * These do not need to happen in a critical section and are not
> +        * WAL-logged.
> +        *
> +        * As of PostgreSQL 9.2, the visibility map bit should never be set if the
> +        * page-level bit is clear.  However, it's possible that the bit got
> +        * cleared after heap_vac_scan_next_block() was called, so we must recheck
> +        * with buffer lock before concluding that the VM is corrupt.
> +        */
> +       else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
> +                        visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
> +       {
> +               ereport(WARNING,
> +                               (errcode(ERRCODE_DATA_CORRUPTED),
> +                                errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
> +                                               RelationGetRelationName(relation), heap_blk)));
> +
> +               visibilitymap_clear(relation, heap_blk, vmbuffer,
> +                                                       VISIBILITYMAP_VALID_BITS);
> +       }
> ```
>
> Here in the comment and error message, I guess “visibility map bit” refers to “all visible bit”, can we be explicit?

This is an existing comment in lazy_scan_prune() that I simply moved.
It isn't valid for the all-frozen bit to be set unless the all-visible
bit is set. I'm not sure whether specifying which bits were set in the
warning will help users debug the corruption they are seeing. But I
think it is a reasonable suggestion to make. Perhaps it is worth
suggesting this (adding the specific vmbits to the warning message) in
a separate thread since it is an independent improvement on master?

- Melanie


Attachments:

  [text/x-patch] v24-0001-Simplify-vacuum-visibility-assertion.patch (1.4K, 2-v24-0001-Simplify-vacuum-visibility-assertion.patch)
  download | inline diff:
From 08652e26242aceb5048d384209b49ff6d4b287d3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 10:42:53 -0500
Subject: [PATCH v24 01/16] Simplify vacuum visibility assertion

Phase I vacuum gives the page a once-over after pruning and freezing to
check that the values of all_visible and all_frozen agree with the
result of heap_page_is_all_visible(). This is meant to keep the logic in
phase I for determining visibility in sync with the logic in phase III.

Rewrite the assertion to avoid an Assert(false).

Suggested by Andres Freund.
---
 src/backend/access/heap/vacuumlazy.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 65bb0568a86..984d5879947 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2028,10 +2028,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
+		Assert(heap_page_is_all_visible(vacrel->rel, buf,
+										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.all_frozen == debug_all_frozen);
 
-- 
2.43.0



  [text/x-patch] v24-0002-Add-comment-about-PD_ALL_VISIBLE-and-VM-sync.patch (1.2K, 3-v24-0002-Add-comment-about-PD_ALL_VISIBLE-and-VM-sync.patch)
  download | inline diff:
From 33e063761f30c23ce923ea485eb9cb86acee2d92 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 8 Dec 2025 17:32:49 -0500
Subject: [PATCH v24 02/16] Add comment about PD_ALL_VISIBLE and VM sync

The comment above heap_xlog_visible() about the critical integrity
requirement for PD_ALL_VISIBLE and the visibility map should also be in
heap_xlog_prune_freeze() where we set PD_ALL_VISIBLE.
---
 src/backend/access/heap/heapam_xlog.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11cb3f74da5..a09fb4b803a 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -157,6 +157,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		/*
+		 * The critical integrity requirement here is that we must never end
+		 * up with the visibility map bit set and the page-level
+		 * PD_ALL_VISIBLE bit unset.  If that were to occur, a subsequent page
+		 * modification would fail to clear the visibility map bit.
+		 */
 		if (vmflags & VISIBILITYMAP_VALID_BITS)
 			PageSetAllVisible(page);
 
-- 
2.43.0



  [text/x-patch] v24-0003-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch (8.2K, 4-v24-0003-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch)
  download | inline diff:
From ed8807b5099a0066881c8b8e1690100fa71f2e90 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v24 03/16] Combine visibilitymap_set() cases in
 lazy_scan_prune()

The heap buffer is unconditionally added to the WAL chain when setting
the VM, so it must always be marked dirty.

In one of the cases in lazy_scan_prune(), we try to avoid setting
PD_ALL_VISIBLE and marking the buffer dirty again if PD_ALL_VISIBLE is
already set. There is little gain here, and if we eliminate that
condition, we can easily combine the two cases which set the VM in
lazy_scan_prune(). This is more straightforward and makes it clear that
the heap buffer must be marked dirty since it is added to the WAL chain.

In the previously separate second VM set case, the heap buffer would
always be dirty anyway -- either because we just froze a tuple and
marked the buffer dirty or because we modified the buffer between
find_next_unskippable_block() and heap_page_prune_and_freeze() and then
pruned it in heap_page_prune_and_freeze().

This commit also adds a test case to ensure we don't add code
resulting in the heap buffer not being marked dirty before being
added to the WAL chain.

XXX: is it okay to do a checkpoint in the pg_visibility test?
---
 .../pg_visibility/expected/pg_visibility.out  | 13 +++
 contrib/pg_visibility/sql/pg_visibility.sql   |  9 ++
 src/backend/access/heap/vacuumlazy.c          | 95 ++++---------------
 3 files changed, 43 insertions(+), 74 deletions(-)

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..adc01162895 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -204,6 +204,19 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- the heap buffer must be marked dirty before adding it to the WAL chain when
+-- setting the VM
+create table test_heap_buffer_dirty(a int);
+insert into test_heap_buffer_dirty values (1);
+vacuum (freeze) test_heap_buffer_dirty;
+checkpoint;
+select pg_truncate_visibility_map('test_heap_buffer_dirty');
+ pg_truncate_visibility_map 
+----------------------------
+ 
+(1 row)
+
+vacuum test_heap_buffer_dirty;
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 -- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..0cdd087badb 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,15 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- the heap buffer must be marked dirty before adding it to the WAL chain when
+-- setting the VM
+create table test_heap_buffer_dirty(a int);
+insert into test_heap_buffer_dirty values (1);
+vacuum (freeze) test_heap_buffer_dirty;
+checkpoint;
+select pg_truncate_visibility_map('test_heap_buffer_dirty');
+vacuum test_heap_buffer_dirty;
+
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 984d5879947..14040552e48 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2080,15 +2080,21 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
 	 * all_frozen variables
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.all_visible && !all_visible_according_to_vm) ||
+		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
 	{
 		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
+		uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 		if (presult.all_frozen)
 		{
+			/*
+			 * We can pass InvalidTransactionId as our cutoff_xid, since a
+			 * snapshotConflictHorizon sufficient to make everything safe for
+			 * REDO was logged when the page's tuples were frozen.
+			 */
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
+			new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
 		/*
@@ -2097,36 +2103,36 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * The heap page is added to the WAL chain even if it wasn't modified,
+		 * so we still need to mark it dirty. The only scenario where it isn't
+		 * modified in phase I is when the VM was truncated or removed, which
+		 * isn't worth optimizing for.
 		 */
 		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
+									   new_vmbits);
 
 		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
+		 * For the purposes of logging, count whether or not the page was
+		 * newly set all-visible and, potentially, all-frozen.
 		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+			(new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 		{
 			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
+			if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 			{
 				vacrel->vm_new_visible_frozen_pages++;
 				*vm_page_frozen = true;
 			}
 		}
 		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+				 (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
+			Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
 			vacrel->vm_new_frozen_pages++;
 			*vm_page_frozen = true;
 		}
@@ -2177,65 +2183,6 @@ lazy_scan_prune(LVRelState *vacrel,
 							VISIBILITYMAP_VALID_BITS);
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v24-0004-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patch (9.5K, 5-v24-0004-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patch)
  download | inline diff:
From 842dd8da0c38440315de4e01bda026970b42d7eb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v24 04/16] Refactor lazy_scan_prune() VM set logic into helper

While this may not be an improvement on its own, encapsulating the logic
for determining what to set the VM bits to in a helper is one step
toward setting the VM in heap_page_prune_and_freeze().
---
 src/backend/access/heap/vacuumlazy.c | 209 ++++++++++++++++-----------
 1 file changed, 126 insertions(+), 83 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 14040552e48..577950c2f77 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1934,6 +1934,104 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from PruneFreezeResult and
+ * all_visible_according_to_vm. This function does not actually set the VM
+ * bits or page-level visibility hint, PD_ALL_VISIBLE.
+ *
+ * If it finds that the page-level visibility hint or VM is corrupted, it will
+ * fix them by clearing the VM bits and visibility page hint. This does not
+ * need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning
+ * what bits should be set in the VM in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(Relation relation,
+					  BlockNumber heap_blk,
+					  Buffer heap_buf,
+					  Buffer vmbuffer,
+					  bool all_visible_according_to_vm,
+					  const PruneFreezeResult *presult,
+					  uint8 *new_vmbits)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+
+	*new_vmbits = 0;
+
+	/*
+	 * Determine what to set the visibility map bits to based on information
+	 * from the VM (as of last heap_vac_scan_next_block() call), and from
+	 * all_visible and all_frozen variables.
+	 */
+	if ((presult->all_visible && !all_visible_according_to_vm) ||
+		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+		if (presult->all_frozen)
+		{
+			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+		}
+
+		return true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return false;
+}
+
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -1964,6 +2062,9 @@ lazy_scan_prune(LVRelState *vacrel,
 				bool *vm_page_frozen)
 {
 	Relation	rel = vacrel->rel;
+	bool		do_set_vm = false;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 	PruneFreezeResult presult;
 	PruneFreezeParams params = {
 		.relation = rel,
@@ -2075,33 +2176,20 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 	Assert(!presult.all_frozen || presult.all_visible);
 
-	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
-	 */
-	if ((presult.all_visible && !all_visible_according_to_vm) ||
-		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
-	{
-		uint8		old_vmbits;
-		uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-		{
-			/*
-			 * We can pass InvalidTransactionId as our cutoff_xid, since a
-			 * snapshotConflictHorizon sufficient to make everything safe for
-			 * REDO was logged when the page's tuples were frozen.
-			 */
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-		}
+	do_set_vm = heap_page_will_set_vm(rel,
+									  blkno,
+									  buf,
+									  vmbuffer,
+									  all_visible_according_to_vm,
+									  &presult,
+									  &new_vmbits);
 
+	if (do_set_vm)
+	{
 		/*
 		 * It should never be the case that the visibility map page is set
 		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
+		 * checksums are not enabled).
 		 *
 		 * The heap page is added to the WAL chain even if it wasn't modified,
 		 * so we still need to mark it dirty. The only scenario where it isn't
@@ -2114,73 +2202,28 @@ lazy_scan_prune(LVRelState *vacrel,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
 									   new_vmbits);
-
-		/*
-		 * For the purposes of logging, count whether or not the page was
-		 * newly set all-visible and, potentially, all-frozen.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
-			(new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
-		{
-			Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
 	}
 
 	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
+		vacrel->vm_new_visible_pages++;
+		if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+		{
+			vacrel->vm_new_visible_frozen_pages++;
+			*vm_page_frozen = true;
+		}
 	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
+	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
+		Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
 	return presult.ndeleted;
-- 
2.43.0



  [text/x-patch] v24-0005-Set-the-VM-in-heap_page_prune_and_freeze.patch (24.9K, 6-v24-0005-Set-the-VM-in-heap_page_prune_and_freeze.patch)
  download | inline diff:
From 0d49cdd7813e02979b9e1b72eb344a93688c5d6e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v24 05/16] Set the VM in heap_page_prune_and_freeze()

This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
---
 src/backend/access/heap/pruneheap.c  | 263 ++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 147 +--------------
 src/include/access/heapam.h          |  27 +++
 3 files changed, 254 insertions(+), 183 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ca44225a10e..d7f36e2764f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -191,6 +194,13 @@ static void page_verify_redirects(Page page);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vm(Relation relation,
+								  BlockNumber heap_blk,
+								  Buffer heap_buf,
+								  Buffer vmbuffer,
+								  bool blk_known_av,
+								  const PruneFreezeResult *presult,
+								  uint8 *new_vmbits);
 
 
 /*
@@ -280,6 +290,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeParams params = {
 				.relation = relation,
 				.buffer = buffer,
+				.vmbuffer = InvalidBuffer,
+				.blk_known_av = false,
 				.reason = PRUNE_ON_ACCESS,
 				.options = 0,
 				.vistest = vistest,
@@ -338,6 +350,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 
 	/*
@@ -386,51 +400,54 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
-	 * that after scanning the line pointers. We must correct all_visible and
-	 * all_frozen before we return them to the caller, so that the caller
-	 * doesn't set the VM bits incorrectly.
+	 * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+	 * them after deciding whether to freeze, but before updating the VM, to
+	 * avoid setting the VM bits incorrectly.
+	 *
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate->attempt_freeze)
 	{
 		prstate->all_visible = true;
 		prstate->all_frozen = true;
 	}
+	else if (prstate->attempt_update_vm)
+	{
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate->all_visible = false;
 		prstate->all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -765,10 +782,118 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from PruneFreezeResult and blk_known_av.
+ * Some callers may already have examined this page’s VM bits (e.g., VACUUM in
+ * the previous heap_vac_scan_next_block() call) and can pass that along as
+ * blk_known_av. Callers that have not previously checked the page's status in
+ * the VM should pass false for blk_known_av.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and page visibility
+ * hint. This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * desired what bits should be set in the VM in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(Relation relation,
+					  BlockNumber heap_blk,
+					  Buffer heap_buf,
+					  Buffer vmbuffer,
+					  bool blk_known_av,
+					  const PruneFreezeResult *presult,
+					  uint8 *new_vmbits)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+
+	*new_vmbits = 0;
+
+	/*
+	 * Determine what the visibility map bits should be set to using the
+	 * values of all_visible and all_frozen determined during
+	 * pruning/freezing.
+	 */
+	if ((presult->all_visible && !blk_known_av) ||
+		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+		if (presult->all_frozen)
+		{
+			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+		}
+
+		return true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 *
+	 * Callers which did not check the visibility map and determine
+	 * blk_known_av will not be eligible for this, however the cost of
+	 * potentially needing to read the visibility map for pages that are not
+	 * all-visible is too high to justify generalizing the check.
+	 */
+	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return false;
+}
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -783,12 +908,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -813,11 +939,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -1001,6 +1130,48 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
 		}
 	}
+
+	presult->new_vmbits = 0;
+	presult->old_vmbits = 0;
+
+	/* Now update the visibility map and PD_ALL_VISIBLE hint */
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = false;
+	if (prstate.attempt_update_vm)
+		do_set_vm = heap_page_will_set_vm(params->relation,
+										  blockno,
+										  buffer,
+										  vmbuffer,
+										  params->blk_known_av,
+										  presult,
+										  &presult->new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || presult->new_vmbits == 0);
+
+	if (do_set_vm)
+	{
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed (if
+		 * checksums are not enabled).
+		 *
+		 * The heap page is added to the WAL chain even if it wasn't modified,
+		 * so we still need to mark it dirty. The only scenario where it isn't
+		 * modified in phase I is when the VM was truncated or removed, which
+		 * isn't worth optimizing for.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buffer);
+		presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
+												InvalidXLogRecPtr,
+												vmbuffer, presult->vm_conflict_horizon,
+												presult->new_vmbits);
+	}
 }
 
 
@@ -1475,6 +1646,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			{
 				TransactionId xmin;
 
+				Assert(prstate->attempt_update_vm);
+
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 577950c2f77..86822778abc 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1935,103 +1935,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 }
 
 
-/*
- * Decide whether to set the visibility map bits (all-visible and all-frozen)
- * for heap_blk using information from PruneFreezeResult and
- * all_visible_according_to_vm. This function does not actually set the VM
- * bits or page-level visibility hint, PD_ALL_VISIBLE.
- *
- * If it finds that the page-level visibility hint or VM is corrupted, it will
- * fix them by clearing the VM bits and visibility page hint. This does not
- * need to be done in a critical section.
- *
- * Returns true if one or both VM bits should be set, along with returning
- * what bits should be set in the VM in *new_vmbits.
- */
-static bool
-heap_page_will_set_vm(Relation relation,
-					  BlockNumber heap_blk,
-					  Buffer heap_buf,
-					  Buffer vmbuffer,
-					  bool all_visible_according_to_vm,
-					  const PruneFreezeResult *presult,
-					  uint8 *new_vmbits)
-{
-	Page		heap_page = BufferGetPage(heap_buf);
-
-	*new_vmbits = 0;
-
-	/*
-	 * Determine what to set the visibility map bits to based on information
-	 * from the VM (as of last heap_vac_scan_next_block() call), and from
-	 * all_visible and all_frozen variables.
-	 */
-	if ((presult->all_visible && !all_visible_according_to_vm) ||
-		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
-	{
-		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-		if (presult->all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		return true;
-	}
-
-	/*
-	 * Now handle two potential corruption cases:
-	 *
-	 * These do not need to happen in a critical section and are not
-	 * WAL-logged.
-	 *
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
-			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(relation), heap_blk)));
-
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(relation), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buf);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	return false;
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2062,15 +1965,14 @@ lazy_scan_prune(LVRelState *vacrel,
 				bool *vm_page_frozen)
 {
 	Relation	rel = vacrel->rel;
-	bool		do_set_vm = false;
-	uint8		new_vmbits = 0;
-	uint8		old_vmbits = 0;
 	PruneFreezeResult presult;
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
+		.blk_known_av = all_visible_according_to_vm,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
@@ -2173,55 +2075,24 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
-	do_set_vm = heap_page_will_set_vm(rel,
-									  blkno,
-									  buf,
-									  vmbuffer,
-									  all_visible_according_to_vm,
-									  &presult,
-									  &new_vmbits);
-
-	if (do_set_vm)
-	{
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).
-		 *
-		 * The heap page is added to the WAL chain even if it wasn't modified,
-		 * so we still need to mark it dirty. The only scenario where it isn't
-		 * modified in phase I is when the VM was truncated or removed, which
-		 * isn't worth optimizing for.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   new_vmbits);
-	}
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
 	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
-		(new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
 		vacrel->vm_new_visible_pages++;
-		if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
 			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
 		vacrel->vm_new_frozen_pages++;
 		*vm_page_frozen = true;
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..bb712c5b29f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,18 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * vmbuffer is the buffer that must already contain the required block of
+	 * the visibility map if we are to update it. blk_known_av is the
+	 * visibility status of the heap block as of the last call to
+	 * find_next_unskippable_block(). Callers which did not check the
+	 * visibility map already should pass false for blk_known_av. This is only
+	 * an optimization for callers that did check the VM and won't affect
+	 * correctness.
+	 */
+	Buffer		vmbuffer;
+	bool		blk_known_av;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -252,6 +265,9 @@ typedef struct PruneFreezeParams
 	 *
 	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
 	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 *
+	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+	 * in the VM.
 	 */
 	int			options;
 
@@ -299,6 +315,17 @@ typedef struct PruneFreezeResult
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
+	 *
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
+	 * we have attempted to update the VM.
+	 */
+	uint8		new_vmbits;
+	uint8		old_vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v24-0006-Move-VM-assert-into-prune-freeze-code.patch (14.2K, 7-v24-0006-Move-VM-assert-into-prune-freeze-code.patch)
  download | inline diff:
From 30e6b5420669389b0b0e6169905d344442d17266 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v24 06/16] Move VM assert into prune/freeze code

This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
---
 src/backend/access/heap/pruneheap.c  | 138 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c |  68 +------------
 src/include/access/heapam.h          |  25 ++---
 3 files changed, 109 insertions(+), 122 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d7f36e2764f..96dc902ec12 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -199,7 +199,7 @@ static bool heap_page_will_set_vm(Relation relation,
 								  Buffer heap_buf,
 								  Buffer vmbuffer,
 								  bool blk_known_av,
-								  const PruneFreezeResult *presult,
+								  const PruneState *prstate,
 								  uint8 *new_vmbits);
 
 
@@ -784,9 +784,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 
 /*
  * Decide whether to set the visibility map bits (all-visible and all-frozen)
- * for heap_blk using information from PruneFreezeResult and blk_known_av.
- * Some callers may already have examined this page’s VM bits (e.g., VACUUM in
- * the previous heap_vac_scan_next_block() call) and can pass that along as
+ * for heap_blk using information from PruneState and blk_known_av. Some
+ * callers may already have examined this page’s VM bits (e.g., VACUUM in the
+ * previous heap_vac_scan_next_block() call) and can pass that along as
  * blk_known_av. Callers that have not previously checked the page's status in
  * the VM should pass false for blk_known_av.
  *
@@ -806,27 +806,30 @@ heap_page_will_set_vm(Relation relation,
 					  Buffer heap_buf,
 					  Buffer vmbuffer,
 					  bool blk_known_av,
-					  const PruneFreezeResult *presult,
+					  const PruneState *prstate,
 					  uint8 *new_vmbits)
 {
 	Page		heap_page = BufferGetPage(heap_buf);
 
 	*new_vmbits = 0;
 
+	if (!prstate->attempt_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		return false;
+	}
+
 	/*
 	 * Determine what the visibility map bits should be set to using the
 	 * values of all_visible and all_frozen determined during
 	 * pruning/freezing.
 	 */
-	if ((presult->all_visible && !blk_known_av) ||
-		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	if ((prstate->all_visible && !blk_known_av) ||
+		(prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
 	{
 		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-		if (presult->all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+		if (prstate->all_frozen)
 			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-		}
 
 		return true;
 	}
@@ -873,7 +876,7 @@ heap_page_will_set_vm(Relation relation,
 	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
 	 * however.
 	 */
-	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
 	{
 		ereport(WARNING,
 				(errcode(ERRCODE_DATA_CORRUPTED),
@@ -889,6 +892,30 @@ heap_page_will_set_vm(Relation relation,
 	return false;
 }
 
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+#endif
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -942,6 +969,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -1097,23 +1125,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
@@ -1134,18 +1147,67 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->new_vmbits = 0;
 	presult->old_vmbits = 0;
 
-	/* Now update the visibility map and PD_ALL_VISIBLE hint */
+	/*
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so we don't need to again.
+	 */
+	if (prstate.all_frozen)
+		vm_conflict_horizon = InvalidTransactionId;
+	else
+		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(presult->lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(params->relation, buffer,
+										prstate.cutoffs->OldestXmin,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == vm_conflict_horizon);
+	}
+#endif
+
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = false;
-	if (prstate.attempt_update_vm)
-		do_set_vm = heap_page_will_set_vm(params->relation,
-										  blockno,
-										  buffer,
-										  vmbuffer,
-										  params->blk_known_av,
-										  presult,
-										  &presult->new_vmbits);
+	/*
+	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+	 * based on information from the VM and the all_visible/all_frozen flags.
+	 *
+	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+	 * VM bit is clear, we strongly prefer to keep them in sync.
+	 *
+	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+	 * already been set. Setting only the VM is most common when setting an
+	 * already all-visible page all-frozen.
+	 */
+	do_set_vm = heap_page_will_set_vm(params->relation,
+									  blockno,
+									  buffer,
+									  vmbuffer,
+									  params->blk_known_av,
+									  &prstate,
+									  &presult->new_vmbits);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
@@ -1169,7 +1231,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		MarkBufferDirty(buffer);
 		presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
 												InvalidXLogRecPtr,
-												vmbuffer, presult->vm_conflict_horizon,
+												vmbuffer, vm_conflict_horizon,
 												presult->new_vmbits);
 	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 86822778abc..9f404e03869 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,20 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
-										   OffsetNumber *deadoffsets,
-										   int ndeadoffsets,
-										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
-										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2016,32 +2002,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -3496,29 +3456,6 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
-{
-
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
-										  NULL, 0,
-										  all_frozen,
-										  visibility_cutoff_xid,
-										  logging_offnum);
-}
-#endif
 
 /*
  * Check whether the heap page in buf is all-visible except for the dead
@@ -3542,15 +3479,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
  * This logic is closely related to heap_prune_record_unchanged_lp_normal().
  * If you modify this function, ensure consistency with that code. An
  * assertion cross-checks that both remain in agreement. Do not introduce new
  * side-effects.
  */
-static bool
+bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bb712c5b29f..392af6503da 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -263,8 +263,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
 	 *
 	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
 	 * in the VM.
@@ -300,21 +299,6 @@ typedef struct PruneFreezeResult
 	int			live_tuples;
 	int			recently_dead_tuples;
 
-	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
-	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
-
 	/*
 	 * old_vmbits are the state of the all-visible and all-frozen bits in the
 	 * visibility map before updating it during phase I of vacuuming.
@@ -460,6 +444,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v24-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (14.2K, 8-v24-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From b191695afcba438ae8c5d1c3b4d5939c76d22a4f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v24 07/16] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

NOTE: This commit is the main commit and all review-only commits
preceding it will be squashed into it.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c | 255 ++++++++++++++++------------
 1 file changed, 144 insertions(+), 111 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 96dc902ec12..489b8487599 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -194,6 +194,12 @@ static void page_verify_redirects(Page page);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  uint8 new_vmbits,
+									  TransactionId latest_xid_removed,
+									  TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid,
+									  bool blk_already_av);
 static bool heap_page_will_set_vm(Relation relation,
 								  BlockNumber heap_blk,
 								  Buffer heap_buf,
@@ -782,6 +788,64 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm, uint8 new_vmbits,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid, bool blk_already_av)
+{
+	TransactionId conflict_xid;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune && !do_freeze &&
+		do_set_vm && blk_already_av && (new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	return conflict_xid;
+}
+
 /*
  * Decide whether to set the visibility map bits (all-visible and all-frozen)
  * for heap_blk using information from PruneState and blk_known_av. Some
@@ -969,7 +1033,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -977,6 +1040,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1038,6 +1104,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+	 * based on information from the VM and the all_visible/all_frozen flags.
+	 *
+	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+	 * VM bit is clear, we strongly prefer to keep them in sync.
+	 *
+	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+	 * already been set. Setting only the VM is most common when setting an
+	 * already all-visible page all-frozen.
+	 */
+	do_set_vm = heap_page_will_set_vm(params->relation,
+									  blockno, buffer, vmbuffer, params->blk_known_av,
+									  &prstate, &new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm, new_vmbits,
+									prstate.latest_xid_removed, prstate.frz_conflict_horizon,
+									prstate.visibility_cutoff_xid, params->blk_known_av);
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1059,14 +1155,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1080,6 +1179,15 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
+		if (do_set_vm)
+		{
+			PageSetAllVisible(page);
+			old_vmbits = visibilitymap_set_vmbits(blockno,
+												  vmbuffer, new_vmbits,
+												  params->relation->rd_locator);
+			Assert(old_vmbits != new_vmbits);
+		}
+
 		MarkBufferDirty(buffer);
 
 		/*
@@ -1087,29 +1195,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(params->relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1119,46 +1210,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->hastup = prstate.hastup;
-
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
-	if (prstate.attempt_freeze)
-	{
-		if (presult->nfrozen > 0)
-		{
-			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
-		}
-		else
-		{
-			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
-		}
-	}
-
-	presult->new_vmbits = 0;
-	presult->old_vmbits = 0;
-
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so we don't need to again.
-	 */
-	if (prstate.all_frozen)
-		vm_conflict_horizon = InvalidTransactionId;
-	else
-		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 	/*
 	 * During its second pass over the heap, VACUUM calls
@@ -1173,7 +1226,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(presult->lpdead_items == 0);
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
 										prstate.cutoffs->OldestXmin,
@@ -1183,56 +1237,35 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		Assert(prstate.all_frozen == debug_all_frozen);
 
 		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == vm_conflict_horizon);
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
 #endif
 
-	Assert(!prstate.all_frozen || prstate.all_visible);
-	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
-	/*
-	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
-	 * based on information from the VM and the all_visible/all_frozen flags.
-	 *
-	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
-	 * VM bit is clear, we strongly prefer to keep them in sync.
-	 *
-	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
-	 * already been set. Setting only the VM is most common when setting an
-	 * already all-visible page all-frozen.
-	 */
-	do_set_vm = heap_page_will_set_vm(params->relation,
-									  blockno,
-									  buffer,
-									  vmbuffer,
-									  params->blk_known_av,
-									  &prstate,
-									  &presult->new_vmbits);
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->hastup = prstate.hastup;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
-	/*
-	 * new_vmbits should be 0 regardless of whether or not the page is
-	 * all-visible if we do not intend to set the VM.
-	 */
-	Assert(do_set_vm || presult->new_vmbits == 0);
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
 
-	if (do_set_vm)
+	if (prstate.attempt_freeze)
 	{
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).
-		 *
-		 * The heap page is added to the WAL chain even if it wasn't modified,
-		 * so we still need to mark it dirty. The only scenario where it isn't
-		 * modified in phase I is when the VM was truncated or removed, which
-		 * isn't worth optimizing for.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buffer);
-		presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
-												InvalidXLogRecPtr,
-												vmbuffer, vm_conflict_horizon,
-												presult->new_vmbits);
+		if (presult->nfrozen > 0)
+		{
+			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+		}
+		else
+		{
+			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+		}
 	}
 }
 
-- 
2.43.0



  [text/x-patch] v24-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 9-v24-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 9f13a78c6bf5d6deda758b623b7790c45317ad6f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v24 08/16] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9f404e03869..6107777097d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1872,9 +1872,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1891,13 +1894,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v24-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.4K, 10-v24-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From f21dc2adabecf6404a7cf96c1e9254dbe77fa613 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v24 09/16] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |   6 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 112 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  28 +---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 46 insertions(+), 375 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4d382a04338..3ad78ba4694 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8812,50 +8812,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a09fb4b803a..b66736ea282 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 489b8487599..ab354add711 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1182,9 +1182,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_set_vm)
 		{
 			PageSetAllVisible(page);
-			old_vmbits = visibilitymap_set_vmbits(blockno,
-												  vmbuffer, new_vmbits,
-												  params->relation->rd_locator);
+			old_vmbits = visibilitymap_set(blockno,
+										   vmbuffer, new_vmbits,
+										   params->relation->rd_locator);
 			Assert(old_vmbits != new_vmbits);
 		}
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6107777097d..b73dbdbe4ed 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1894,11 +1894,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2780,9 +2780,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 * set PD_ALL_VISIBLE.
 		 */
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..69678187832 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBILITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9dd65b10254..1819f3dbb77 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4306,7 +4306,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v24-0010-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (8.1K, 11-v24-0010-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From 0e0143836cef4f45e89a29ebcceed4a94fbff1d9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v24 10/16] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.

The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 12 ++++++------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 17 ++++++++---------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ab354add711..00016a0c1dd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -252,7 +252,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -486,7 +486,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
+	 * checked item causes GlobalVisFullXidVisibleToAll() to update the
 	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
 	 * transaction aborts.
 	 *
@@ -1299,11 +1299,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1762,7 +1762,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				/*
 				 * For now always use prstate->cutoffs for this test, because
 				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisTestIsRemovableXid
+				 * is requested. We could use GlobalVisXidVisibleToAll()
 				 * instead, if a non-freezing caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 71ef2e5036f..1c0eb425ee9 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 }
 
 /*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
  *
  * It is crucial that this only gets called for xids from a source that
  * protects against xid wraparounds (e.g. from a table and thus protected by
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
  */
 bool
 GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v24-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (10.6K, 12-v24-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From ff5e29b04980eb5ae8c96f32423683dfc26cebd7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v24 11/16] Use GlobalVisState in vacuum to determine page
 level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++
 src/backend/access/heap/pruneheap.c         | 43 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 10 ++---
 src/include/access/heapam.h                 | 11 +++---
 4 files changed, 58 insertions(+), 34 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 00016a0c1dd..7e628d4ad59 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -449,11 +449,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -965,14 +966,13 @@ heap_page_will_set_vm(Relation relation,
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -1058,6 +1058,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prune_freeze_plan(RelationGetRelid(params->relation),
 					  buffer, &prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1227,10 +1237,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
-										prstate.cutoffs->OldestXmin,
+										prstate.vistest,
 										&debug_all_frozen,
 										&debug_cutoff, off_loc));
 
@@ -1759,20 +1768,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisXidVisibleToAll()
-				 * instead, if a non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b73dbdbe4ed..587cf906fe6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2734,7 +2734,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3493,7 +3493,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3509,7 +3509,7 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3583,7 +3583,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 		/* Visibility checks may do IO or allocate memory */
 		Assert(CritSectionCount == 0);
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3602,7 +3602,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 392af6503da..b6f1b3fb448 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -278,10 +278,9 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Contains the cutoffs used for freezing. They are required if the
-	 * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
-	 * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
-	 * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
-	 * calculates them once, at the beginning of vacuuming the relation.
+	 * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+	 * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+	 * relation.
 	 */
 	struct VacuumCutoffs *cutoffs;
 } PruneFreezeParams;
@@ -445,7 +444,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -459,6 +458,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v24-0012-Unset-all_visible-sooner-if-not-freezing.patch (2.5K, 13-v24-0012-Unset-all_visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 4869ad945d582cbe7dd57b5bb5ef458187ba4f64 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v24 12/16] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7e628d4ad59..4ed2eff5e05 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1625,8 +1625,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	/*
 	 * Deliberately delay unsetting all_visible and all_frozen until later
 	 * during pruning. Removable dead tuples shouldn't preclude freezing the
-	 * page.
+	 * page. If we won't attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1885,8 +1890,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible and all_frozen until later, at the
 	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
 	 * freeze the page after pruning.  As long as we unset it before updating
-	 * the visibility map, this will be correct.
+	 * the visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v24-0013-Track-which-relations-are-modified-by-a-query.patch (2.5K, 14-v24-0013-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 031b80ff688f33c0be14037d7b5e7a06cc9d6aef Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v24 13/16] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
---
 src/backend/executor/execMain.c  | 4 ++++
 src/backend/executor/execUtils.c | 2 ++
 src/include/nodes/execnodes.h    | 6 ++++++
 3 files changed, 12 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..0630a5af79e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 64ff6996431..7f6522cea8e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v24-0014-Pass-down-information-on-table-modification-to-s.patch (23.0K, 15-v24-0014-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 00c65fad7e817a2a10ec47272bbe990c50502078 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v24 14/16] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  7 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 +++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  4 ++--
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++----
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      | 11 ++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              | 19 ++++++++++-------
 23 files changed, 91 insertions(+), 44 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index cb3331921cb..b9613787b85 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index f87c60a230c..645688f9241 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..d7fac94826d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0(sizeof(IndexFetchHeapData));
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index c96917085c2..9d425504e1b 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
 	}
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 454adaee7dc..02ab0233e59 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 1e099febdc8..db2a302a486 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..22b453dc617 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 07e5b95782e..58dbbf4d851 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6345,7 +6345,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13730,7 +13730,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 47d5047fe8b..055759cd343 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index dd323c9b9fd..b41bfeca244 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index def32774c90..473d236e551 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f464cca9507..87b04b1b88e 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f36929deec3..90f929ce741 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 8ba038c5ef4..d3b340ee2a7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3370,7 +3370,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 540aa9628d7..28434146eba 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b6f1b3fb448..480a1bd654f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
 
 	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
 	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d10b1b03cdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the rel */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1164,9 +1167,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v24-0015-Allow-on-access-pruning-to-set-pages-all-visible.patch (13.3K, 16-v24-0015-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 352e4756fb7641a0142a7e2a1a0826d81427b935 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v24 15/16] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 15 +++-
 src/backend/access/heap/heapam_handler.c      | 15 +++-
 src/backend/access/heap/pruneheap.c           | 74 ++++++++++++++++---
 src/include/access/heapam.h                   | 24 +++++-
 .../t/035_standby_logical_decoding.pl         |  3 +-
 5 files changed, 114 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3ad78ba4694..ecc04390ac7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d7fac94826d..27e3498f5f4 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2472,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2518,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4ed2eff5e05..15239e0cbbd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,7 +205,9 @@ static bool heap_page_will_set_vm(Relation relation,
 								  Buffer heap_buf,
 								  Buffer vmbuffer,
 								  bool blk_known_av,
-								  const PruneState *prstate,
+								  PruneReason reason,
+								  bool do_prune, bool do_freeze,
+								  PruneState *prstate,
 								  uint8 *new_vmbits);
 
 
@@ -220,9 +222,13 @@ static bool heap_page_will_set_vm(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -304,6 +310,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 				.cutoffs = NULL,
 			};
 
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
 
@@ -862,6 +875,9 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm, uint8 new_vmbits
  * corrupted, it will fix them by clearing the VM bits and page visibility
  * hint. This does not need to be done in a critical section.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with returning the
  * desired what bits should be set in the VM in *new_vmbits.
  */
@@ -871,7 +887,9 @@ heap_page_will_set_vm(Relation relation,
 					  Buffer heap_buf,
 					  Buffer vmbuffer,
 					  bool blk_known_av,
-					  const PruneState *prstate,
+					  PruneReason reason,
+					  bool do_prune, bool do_freeze,
+					  PruneState *prstate,
 					  uint8 *new_vmbits)
 {
 	Page		heap_page = BufferGetPage(heap_buf);
@@ -884,6 +902,24 @@ heap_page_will_set_vm(Relation relation,
 		return false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+		return false;
+	}
+
 	/*
 	 * Determine what the visibility map bits should be set to using the
 	 * values of all_visible and all_frozen determined during
@@ -906,14 +942,15 @@ heap_page_will_set_vm(Relation relation,
 	 * WAL-logged.
 	 *
 	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
+	 * page-level bit is clear.  However, it's possible that in vacuum the bit
+	 * got cleared after heap_vac_scan_next_block() was called, so we must
+	 * recheck with buffer lock before concluding that the VM is corrupt.
 	 *
 	 * Callers which did not check the visibility map and determine
 	 * blk_known_av will not be eligible for this, however the cost of
 	 * potentially needing to read the visibility map for pages that are not
-	 * all-visible is too high to justify generalizing the check.
+	 * all-visible is too high to justify generalizing the check. A future
+	 * vacuum will have to take care of fixing the corruption.
 	 */
 	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
 			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -1129,6 +1166,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	do_set_vm = heap_page_will_set_vm(params->relation,
 									  blockno, buffer, vmbuffer, params->blk_known_av,
+									  params->reason, do_prune, do_freeze,
 									  &prstate, &new_vmbits);
 
 	/*
@@ -1195,15 +1233,31 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			old_vmbits = visibilitymap_set(blockno,
 										   vmbuffer, new_vmbits,
 										   params->relation->rd_locator);
-			Assert(old_vmbits != new_vmbits);
+
+			/*
+			 * If on-access pruning set the VM in between when vacuum first
+			 * checked the visibility map and determined blk_known_av and when
+			 * we actually prune the page, we could end up trying to set the
+			 * VM only to find it is already set.
+			 */
+			if (old_vmbits == new_vmbits)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				/* Unset so we don't emit WAL since no change occured */
+				do_set_vm = false;
+			}
 		}
 
 		MarkBufferDirty(buffer);
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
+		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+		 * only planning to update the VM, and it turns out that it was
+		 * already set, there is no need to emit WAL. As such, we must check
+		 * that some change is required again.
 		 */
-		if (RelationNeedsWAL(params->relation))
+		if (RelationNeedsWAL(params->relation) &&
+			(do_prune || do_freeze || do_set_vm))
 		{
 			log_heap_prune_and_freeze(params->relation, buffer,
 									  do_set_vm ? vmbuffer : InvalidBuffer,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 480a1bd654f..89538652566 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 
 	/*
 	 * Some optimizations can only be performed if the query does not modify
@@ -425,7 +442,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v24-0016-Set-pd_prune_xid-on-insert.patch (6.7K, 17-v24-0016-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 44126b8cbf2f4a353a82d380f1c636db72087db4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v24 16/16] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.

ci-os-only:
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../modules/index/expected/killtuples.out     |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ecc04390ac7..d5f3f897dd3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index b66736ea282..5c8dc2718ce 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-10 23:35  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-12-10 23:35 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Tue, Dec 9, 2025 at 12:48 PM Melanie Plageman
<[email protected]> wrote:
>
> In this set 0001 and 0002 are independent. 0003-0007 are all small
> steps toward the single change in 0007 which combines the VM updates
> into the same WAL record as pruning and freezing. 0008 and 0009 are
> removing the rest of XLOG_HEAP2_VISIBLE. 0010 - 0012 are refactoring
> needed to set the VM during on-access pruning. 0013 - 0015 are small
> steps toward setting the VM on-access. And 0016 sets the prune xid on
> insert so we may set the VM on-access for pages that have only new
> data.

I committed 0001 and 0002. attached v25 reflects that.
0001-0004 refactoring steps for eliminate visible record from phase I
(not probably independent commits in the end)
0005 eliminate XLOG_HEAP2_VISIBLE from phase I vac
0006-0007 removing the rest of XLOG_HEAP2_VISIBLE
0008-0010 refactoring for setting VM on-access
0011-0013 setting the VM on-access
0014 - setting pd_prune_xid on insert

- Melanie


Attachments:

  [application/x-patch] v25-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch (8.7K, 2-v25-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch)
  download | inline diff:
From 0863c5db56d8f62cf525e8b98ab71245e27c17a6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v25 01/14] Combine visibilitymap_set() cases in
 lazy_scan_prune()

The heap buffer is unconditionally added to the WAL chain when setting
the VM, so it must always be marked dirty.

In one of the cases in lazy_scan_prune(), we try to avoid setting
PD_ALL_VISIBLE and marking the buffer dirty again if PD_ALL_VISIBLE is
already set. There is little gain here, and if we eliminate that
condition, we can easily combine the two cases which set the VM in
lazy_scan_prune(). This is more straightforward and makes it clear that
the heap buffer must be marked dirty since it is added to the WAL chain.

In the previously separate second VM set case, the heap buffer would
always be dirty anyway -- either because we just froze a tuple and
marked the buffer dirty or because we modified the buffer between
find_next_unskippable_block() and heap_page_prune_and_freeze() and then
pruned it in heap_page_prune_and_freeze().

This commit also adds a test case for vacuum when it does not need
to modify the heap page. Currently that would ensure the heap buffer is
marked dirty before adding it to the WAL chain, but if we ever remove it
from the VM set WAL chain or pass it with REGBUF_NO_CHANGES, it would
also serve as coverage of that.
---
 .../pg_visibility/expected/pg_visibility.out  | 17 ++++
 contrib/pg_visibility/sql/pg_visibility.sql   | 13 +++
 src/backend/access/heap/vacuumlazy.c          | 91 ++++---------------
 3 files changed, 48 insertions(+), 73 deletions(-)

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..3608f801eee 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -204,6 +204,23 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map 
+----------------------------
+ 
+(1 row)
+
+-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
+-- modification
+vacuum test_vac_unmodified_heap;
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 -- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..6af7c179df0 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,19 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
+-- modification
+vacuum test_vac_unmodified_heap;
+
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e8c99c3773d..38a1268b004 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2094,15 +2094,21 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
 	 * all_frozen variables
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.all_visible && !all_visible_according_to_vm) ||
+		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
 	{
 		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
+		uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 		if (presult.all_frozen)
 		{
+			/*
+			 * We can pass InvalidTransactionId as our cutoff_xid, since a
+			 * snapshotConflictHorizon sufficient to make everything safe for
+			 * REDO was logged when the page's tuples were frozen.
+			 */
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
+			new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
 		/*
@@ -2111,35 +2117,33 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * The heap page is added to the WAL chain even if it wasn't modified,
+		 * so we still need to mark it dirty. The only scenario where it isn't
+		 * modified in phase I is when the VM was truncated or removed, which
+		 * isn't worth optimizing for.
 		 */
 		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
+									   new_vmbits);
 
 		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
+		 * For the purposes of logging, count whether or not the page was
+		 * newly set all-visible and, potentially, all-frozen.
 		 */
 		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
 		{
 			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
+			if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 			{
 				vacrel->vm_new_visible_frozen_pages++;
 				*vm_page_frozen = true;
 			}
 		}
 		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+				 (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
 			vacrel->vm_new_frozen_pages++;
 			*vm_page_frozen = true;
@@ -2191,65 +2195,6 @@ lazy_scan_prune(LVRelState *vacrel,
 							VISIBILITYMAP_VALID_BITS);
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [application/x-patch] v25-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patch (9.4K, 3-v25-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patch)
  download | inline diff:
From e697a3c859e18365d14d7c754c47ecccfb0f8e9a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v25 02/14] Refactor lazy_scan_prune() VM set logic into helper

While this may not be an improvement on its own, encapsulating the logic
for determining what to set the VM bits to in a helper is one step
toward setting the VM in heap_page_prune_and_freeze().
---
 src/backend/access/heap/vacuumlazy.c | 207 ++++++++++++++++-----------
 1 file changed, 126 insertions(+), 81 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 38a1268b004..6d5d708352e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1948,6 +1948,104 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from PruneFreezeResult and
+ * all_visible_according_to_vm. This function does not actually set the VM
+ * bits or page-level visibility hint, PD_ALL_VISIBLE.
+ *
+ * If it finds that the page-level visibility hint or VM is corrupted, it will
+ * fix them by clearing the VM bits and visibility page hint. This does not
+ * need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning
+ * what bits should be set in the VM in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(Relation relation,
+					  BlockNumber heap_blk,
+					  Buffer heap_buf,
+					  Buffer vmbuffer,
+					  bool all_visible_according_to_vm,
+					  const PruneFreezeResult *presult,
+					  uint8 *new_vmbits)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+
+	*new_vmbits = 0;
+
+	/*
+	 * Determine what to set the visibility map bits to based on information
+	 * from the VM (as of last heap_vac_scan_next_block() call), and from
+	 * all_visible and all_frozen variables.
+	 */
+	if ((presult->all_visible && !all_visible_according_to_vm) ||
+		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+		if (presult->all_frozen)
+		{
+			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+		}
+
+		return true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return false;
+}
+
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -1978,6 +2076,9 @@ lazy_scan_prune(LVRelState *vacrel,
 				bool *vm_page_frozen)
 {
 	Relation	rel = vacrel->rel;
+	bool		do_set_vm = false;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 	PruneFreezeResult presult;
 	PruneFreezeParams params = {
 		.relation = rel,
@@ -2089,33 +2190,20 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 	Assert(!presult.all_frozen || presult.all_visible);
 
-	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
-	 */
-	if ((presult.all_visible && !all_visible_according_to_vm) ||
-		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
-	{
-		uint8		old_vmbits;
-		uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-		{
-			/*
-			 * We can pass InvalidTransactionId as our cutoff_xid, since a
-			 * snapshotConflictHorizon sufficient to make everything safe for
-			 * REDO was logged when the page's tuples were frozen.
-			 */
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-		}
+	do_set_vm = heap_page_will_set_vm(rel,
+									  blkno,
+									  buf,
+									  vmbuffer,
+									  all_visible_according_to_vm,
+									  &presult,
+									  &new_vmbits);
 
+	if (do_set_vm)
+	{
 		/*
 		 * It should never be the case that the visibility map page is set
 		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
+		 * checksums are not enabled).
 		 *
 		 * The heap page is added to the WAL chain even if it wasn't modified,
 		 * so we still need to mark it dirty. The only scenario where it isn't
@@ -2128,71 +2216,28 @@ lazy_scan_prune(LVRelState *vacrel,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
 									   new_vmbits);
-
-		/*
-		 * For the purposes of logging, count whether or not the page was
-		 * newly set all-visible and, potentially, all-frozen.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
 	}
 
 	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
+		vacrel->vm_new_visible_pages++;
+		if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+		{
+			vacrel->vm_new_visible_frozen_pages++;
+			*vm_page_frozen = true;
+		}
 	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
+	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
+		Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
 	return presult.ndeleted;
-- 
2.43.0



  [application/x-patch] v25-0003-Set-the-VM-in-heap_page_prune_and_freeze.patch (24.9K, 4-v25-0003-Set-the-VM-in-heap_page_prune_and_freeze.patch)
  download | inline diff:
From 2253e4f0982072516ce5da65e9cbadff818836e5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v25 03/14] Set the VM in heap_page_prune_and_freeze()

This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
---
 src/backend/access/heap/pruneheap.c  | 263 ++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 147 +--------------
 src/include/access/heapam.h          |  27 +++
 3 files changed, 254 insertions(+), 183 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ca44225a10e..d7f36e2764f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -191,6 +194,13 @@ static void page_verify_redirects(Page page);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vm(Relation relation,
+								  BlockNumber heap_blk,
+								  Buffer heap_buf,
+								  Buffer vmbuffer,
+								  bool blk_known_av,
+								  const PruneFreezeResult *presult,
+								  uint8 *new_vmbits);
 
 
 /*
@@ -280,6 +290,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeParams params = {
 				.relation = relation,
 				.buffer = buffer,
+				.vmbuffer = InvalidBuffer,
+				.blk_known_av = false,
 				.reason = PRUNE_ON_ACCESS,
 				.options = 0,
 				.vistest = vistest,
@@ -338,6 +350,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 
 	/*
@@ -386,51 +400,54 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
-	 * that after scanning the line pointers. We must correct all_visible and
-	 * all_frozen before we return them to the caller, so that the caller
-	 * doesn't set the VM bits incorrectly.
+	 * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+	 * them after deciding whether to freeze, but before updating the VM, to
+	 * avoid setting the VM bits incorrectly.
+	 *
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate->attempt_freeze)
 	{
 		prstate->all_visible = true;
 		prstate->all_frozen = true;
 	}
+	else if (prstate->attempt_update_vm)
+	{
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate->all_visible = false;
 		prstate->all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -765,10 +782,118 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from PruneFreezeResult and blk_known_av.
+ * Some callers may already have examined this page’s VM bits (e.g., VACUUM in
+ * the previous heap_vac_scan_next_block() call) and can pass that along as
+ * blk_known_av. Callers that have not previously checked the page's status in
+ * the VM should pass false for blk_known_av.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and page visibility
+ * hint. This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * desired what bits should be set in the VM in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(Relation relation,
+					  BlockNumber heap_blk,
+					  Buffer heap_buf,
+					  Buffer vmbuffer,
+					  bool blk_known_av,
+					  const PruneFreezeResult *presult,
+					  uint8 *new_vmbits)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+
+	*new_vmbits = 0;
+
+	/*
+	 * Determine what the visibility map bits should be set to using the
+	 * values of all_visible and all_frozen determined during
+	 * pruning/freezing.
+	 */
+	if ((presult->all_visible && !blk_known_av) ||
+		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+		if (presult->all_frozen)
+		{
+			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+		}
+
+		return true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 *
+	 * Callers which did not check the visibility map and determine
+	 * blk_known_av will not be eligible for this, however the cost of
+	 * potentially needing to read the visibility map for pages that are not
+	 * all-visible is too high to justify generalizing the check.
+	 */
+	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return false;
+}
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -783,12 +908,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -813,11 +939,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -1001,6 +1130,48 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
 		}
 	}
+
+	presult->new_vmbits = 0;
+	presult->old_vmbits = 0;
+
+	/* Now update the visibility map and PD_ALL_VISIBLE hint */
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = false;
+	if (prstate.attempt_update_vm)
+		do_set_vm = heap_page_will_set_vm(params->relation,
+										  blockno,
+										  buffer,
+										  vmbuffer,
+										  params->blk_known_av,
+										  presult,
+										  &presult->new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || presult->new_vmbits == 0);
+
+	if (do_set_vm)
+	{
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed (if
+		 * checksums are not enabled).
+		 *
+		 * The heap page is added to the WAL chain even if it wasn't modified,
+		 * so we still need to mark it dirty. The only scenario where it isn't
+		 * modified in phase I is when the VM was truncated or removed, which
+		 * isn't worth optimizing for.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buffer);
+		presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
+												InvalidXLogRecPtr,
+												vmbuffer, presult->vm_conflict_horizon,
+												presult->new_vmbits);
+	}
 }
 
 
@@ -1475,6 +1646,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			{
 				TransactionId xmin;
 
+				Assert(prstate->attempt_update_vm);
+
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6d5d708352e..81ef81cb8f3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1949,103 +1949,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 }
 
 
-/*
- * Decide whether to set the visibility map bits (all-visible and all-frozen)
- * for heap_blk using information from PruneFreezeResult and
- * all_visible_according_to_vm. This function does not actually set the VM
- * bits or page-level visibility hint, PD_ALL_VISIBLE.
- *
- * If it finds that the page-level visibility hint or VM is corrupted, it will
- * fix them by clearing the VM bits and visibility page hint. This does not
- * need to be done in a critical section.
- *
- * Returns true if one or both VM bits should be set, along with returning
- * what bits should be set in the VM in *new_vmbits.
- */
-static bool
-heap_page_will_set_vm(Relation relation,
-					  BlockNumber heap_blk,
-					  Buffer heap_buf,
-					  Buffer vmbuffer,
-					  bool all_visible_according_to_vm,
-					  const PruneFreezeResult *presult,
-					  uint8 *new_vmbits)
-{
-	Page		heap_page = BufferGetPage(heap_buf);
-
-	*new_vmbits = 0;
-
-	/*
-	 * Determine what to set the visibility map bits to based on information
-	 * from the VM (as of last heap_vac_scan_next_block() call), and from
-	 * all_visible and all_frozen variables.
-	 */
-	if ((presult->all_visible && !all_visible_according_to_vm) ||
-		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
-	{
-		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-		if (presult->all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		return true;
-	}
-
-	/*
-	 * Now handle two potential corruption cases:
-	 *
-	 * These do not need to happen in a critical section and are not
-	 * WAL-logged.
-	 *
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
-			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(relation), heap_blk)));
-
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(relation), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buf);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	return false;
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2076,15 +1979,14 @@ lazy_scan_prune(LVRelState *vacrel,
 				bool *vm_page_frozen)
 {
 	Relation	rel = vacrel->rel;
-	bool		do_set_vm = false;
-	uint8		new_vmbits = 0;
-	uint8		old_vmbits = 0;
 	PruneFreezeResult presult;
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
+		.blk_known_av = all_visible_according_to_vm,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
@@ -2187,55 +2089,24 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
-	do_set_vm = heap_page_will_set_vm(rel,
-									  blkno,
-									  buf,
-									  vmbuffer,
-									  all_visible_according_to_vm,
-									  &presult,
-									  &new_vmbits);
-
-	if (do_set_vm)
-	{
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).
-		 *
-		 * The heap page is added to the WAL chain even if it wasn't modified,
-		 * so we still need to mark it dirty. The only scenario where it isn't
-		 * modified in phase I is when the VM was truncated or removed, which
-		 * isn't worth optimizing for.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   new_vmbits);
-	}
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
 	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
-		(new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
 		vacrel->vm_new_visible_pages++;
-		if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
 			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
 		vacrel->vm_new_frozen_pages++;
 		*vm_page_frozen = true;
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..bb712c5b29f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,18 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * vmbuffer is the buffer that must already contain the required block of
+	 * the visibility map if we are to update it. blk_known_av is the
+	 * visibility status of the heap block as of the last call to
+	 * find_next_unskippable_block(). Callers which did not check the
+	 * visibility map already should pass false for blk_known_av. This is only
+	 * an optimization for callers that did check the VM and won't affect
+	 * correctness.
+	 */
+	Buffer		vmbuffer;
+	bool		blk_known_av;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -252,6 +265,9 @@ typedef struct PruneFreezeParams
 	 *
 	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
 	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 *
+	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+	 * in the VM.
 	 */
 	int			options;
 
@@ -299,6 +315,17 @@ typedef struct PruneFreezeResult
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
+	 *
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
+	 * we have attempted to update the VM.
+	 */
+	uint8		new_vmbits;
+	uint8		old_vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [application/x-patch] v25-0004-Move-VM-assert-into-prune-freeze-code.patch (14.4K, 5-v25-0004-Move-VM-assert-into-prune-freeze-code.patch)
  download | inline diff:
From 0076da14668b81986c7db9b6eeb464f70fc3870d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v25 04/14] Move VM assert into prune/freeze code

This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
---
 src/backend/access/heap/pruneheap.c  | 134 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c |  68 +-------------
 src/include/access/heapam.h          |  25 ++---
 3 files changed, 104 insertions(+), 123 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d7f36e2764f..1b0273c02c9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -199,7 +199,7 @@ static bool heap_page_will_set_vm(Relation relation,
 								  Buffer heap_buf,
 								  Buffer vmbuffer,
 								  bool blk_known_av,
-								  const PruneFreezeResult *presult,
+								  const PruneState *prstate,
 								  uint8 *new_vmbits);
 
 
@@ -784,9 +784,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 
 /*
  * Decide whether to set the visibility map bits (all-visible and all-frozen)
- * for heap_blk using information from PruneFreezeResult and blk_known_av.
- * Some callers may already have examined this page’s VM bits (e.g., VACUUM in
- * the previous heap_vac_scan_next_block() call) and can pass that along as
+ * for heap_blk using information from PruneState and blk_known_av. Some
+ * callers may already have examined this page’s VM bits (e.g., VACUUM in the
+ * previous heap_vac_scan_next_block() call) and can pass that along as
  * blk_known_av. Callers that have not previously checked the page's status in
  * the VM should pass false for blk_known_av.
  *
@@ -806,27 +806,30 @@ heap_page_will_set_vm(Relation relation,
 					  Buffer heap_buf,
 					  Buffer vmbuffer,
 					  bool blk_known_av,
-					  const PruneFreezeResult *presult,
+					  const PruneState *prstate,
 					  uint8 *new_vmbits)
 {
 	Page		heap_page = BufferGetPage(heap_buf);
 
 	*new_vmbits = 0;
 
+	if (!prstate->attempt_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		return false;
+	}
+
 	/*
 	 * Determine what the visibility map bits should be set to using the
 	 * values of all_visible and all_frozen determined during
 	 * pruning/freezing.
 	 */
-	if ((presult->all_visible && !blk_known_av) ||
-		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	if ((prstate->all_visible && !blk_known_av) ||
+		(prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
 	{
 		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-		if (presult->all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+		if (prstate->all_frozen)
 			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-		}
 
 		return true;
 	}
@@ -873,7 +876,7 @@ heap_page_will_set_vm(Relation relation,
 	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
 	 * however.
 	 */
-	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
 	{
 		ereport(WARNING,
 				(errcode(ERRCODE_DATA_CORRUPTED),
@@ -889,6 +892,30 @@ heap_page_will_set_vm(Relation relation,
 	return false;
 }
 
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+#endif
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -942,6 +969,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -1097,23 +1125,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
@@ -1134,18 +1147,60 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->new_vmbits = 0;
 	presult->old_vmbits = 0;
 
-	/* Now update the visibility map and PD_ALL_VISIBLE hint */
+	/*
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so we don't need to again.
+	 */
+	if (prstate.all_frozen)
+		vm_conflict_horizon = InvalidTransactionId;
+	else
+		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(presult->lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(params->relation, buffer,
+										prstate.cutoffs->OldestXmin,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == vm_conflict_horizon);
+	}
+#endif
+
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = false;
-	if (prstate.attempt_update_vm)
-		do_set_vm = heap_page_will_set_vm(params->relation,
-										  blockno,
-										  buffer,
-										  vmbuffer,
-										  params->blk_known_av,
-										  presult,
-										  &presult->new_vmbits);
+	/*
+	 * Decide whether to set the VM bits based on information from the VM and
+	 * the all_visible/all_frozen flags.
+	 */
+	do_set_vm = heap_page_will_set_vm(params->relation,
+									  blockno,
+									  buffer,
+									  vmbuffer,
+									  params->blk_known_av,
+									  &prstate,
+									  &presult->new_vmbits);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
@@ -1158,7 +1213,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		/*
 		 * It should never be the case that the visibility map page is set
 		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).
+		 * checksums are not enabled). However, we strongly prefer to keep
+		 * them in sync.
 		 *
 		 * The heap page is added to the WAL chain even if it wasn't modified,
 		 * so we still need to mark it dirty. The only scenario where it isn't
@@ -1169,7 +1225,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		MarkBufferDirty(buffer);
 		presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
 												InvalidXLogRecPtr,
-												vmbuffer, presult->vm_conflict_horizon,
+												vmbuffer, vm_conflict_horizon,
 												presult->new_vmbits);
 	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 81ef81cb8f3..29382550c03 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,20 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
-										   OffsetNumber *deadoffsets,
-										   int ndeadoffsets,
-										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
-										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2030,32 +2016,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -3511,29 +3471,6 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
-{
-
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
-										  NULL, 0,
-										  all_frozen,
-										  visibility_cutoff_xid,
-										  logging_offnum);
-}
-#endif
 
 /*
  * Check whether the heap page in buf is all-visible except for the dead
@@ -3557,15 +3494,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
  * This logic is closely related to heap_prune_record_unchanged_lp_normal().
  * If you modify this function, ensure consistency with that code. An
  * assertion cross-checks that both remain in agreement. Do not introduce new
  * side-effects.
  */
-static bool
+bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bb712c5b29f..392af6503da 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -263,8 +263,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
 	 *
 	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
 	 * in the VM.
@@ -300,21 +299,6 @@ typedef struct PruneFreezeResult
 	int			live_tuples;
 	int			recently_dead_tuples;
 
-	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
-	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
-
 	/*
 	 * old_vmbits are the state of the all-visible and all-frozen bits in the
 	 * visibility map before updating it during phase I of vacuuming.
@@ -460,6 +444,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [application/x-patch] v25-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (13.7K, 6-v25-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 4ea8150c7cd3a2edb487f3bac1f86e574416ee67 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v25 05/14] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

NOTE: This commit is the main commit and all review-only commits
preceding it will be squashed into it.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c | 247 ++++++++++++++++------------
 1 file changed, 142 insertions(+), 105 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1b0273c02c9..fb82b0c0f86 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -194,6 +194,12 @@ static void page_verify_redirects(Page page);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  uint8 new_vmbits,
+									  TransactionId latest_xid_removed,
+									  TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid,
+									  bool blk_already_av);
 static bool heap_page_will_set_vm(Relation relation,
 								  BlockNumber heap_blk,
 								  Buffer heap_buf,
@@ -782,6 +788,64 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm, uint8 new_vmbits,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid, bool blk_already_av)
+{
+	TransactionId conflict_xid;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune && !do_freeze &&
+		do_set_vm && blk_already_av && (new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	return conflict_xid;
+}
+
 /*
  * Decide whether to set the visibility map bits (all-visible and all-frozen)
  * for heap_blk using information from PruneState and blk_known_av. Some
@@ -969,7 +1033,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -977,6 +1040,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1038,6 +1104,29 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the VM bits based on information from the VM and
+	 * the all_visible/all_frozen flags.
+	 */
+	do_set_vm = heap_page_will_set_vm(params->relation,
+									  blockno, buffer, vmbuffer, params->blk_known_av,
+									  &prstate, &new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm, new_vmbits,
+									prstate.latest_xid_removed, prstate.frz_conflict_horizon,
+									prstate.visibility_cutoff_xid, params->blk_known_av);
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1059,14 +1148,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1080,6 +1172,20 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 */
+			PageSetAllVisible(page);
+			old_vmbits = visibilitymap_set_vmbits(blockno,
+												  vmbuffer, new_vmbits,
+												  params->relation->rd_locator);
+			Assert(old_vmbits != new_vmbits);
+		}
+
 		MarkBufferDirty(buffer);
 
 		/*
@@ -1087,29 +1193,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(params->relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1119,46 +1208,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->hastup = prstate.hastup;
-
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
-	if (prstate.attempt_freeze)
-	{
-		if (presult->nfrozen > 0)
-		{
-			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
-		}
-		else
-		{
-			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
-		}
-	}
-
-	presult->new_vmbits = 0;
-	presult->old_vmbits = 0;
-
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so we don't need to again.
-	 */
-	if (prstate.all_frozen)
-		vm_conflict_horizon = InvalidTransactionId;
-	else
-		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 	/*
 	 * During its second pass over the heap, VACUUM calls
@@ -1173,7 +1224,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(presult->lpdead_items == 0);
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
 										prstate.cutoffs->OldestXmin,
@@ -1183,50 +1235,35 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		Assert(prstate.all_frozen == debug_all_frozen);
 
 		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == vm_conflict_horizon);
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
 #endif
 
-	Assert(!prstate.all_frozen || prstate.all_visible);
-	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
-	/*
-	 * Decide whether to set the VM bits based on information from the VM and
-	 * the all_visible/all_frozen flags.
-	 */
-	do_set_vm = heap_page_will_set_vm(params->relation,
-									  blockno,
-									  buffer,
-									  vmbuffer,
-									  params->blk_known_av,
-									  &prstate,
-									  &presult->new_vmbits);
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->hastup = prstate.hastup;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
-	/*
-	 * new_vmbits should be 0 regardless of whether or not the page is
-	 * all-visible if we do not intend to set the VM.
-	 */
-	Assert(do_set_vm || presult->new_vmbits == 0);
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
 
-	if (do_set_vm)
+	if (prstate.attempt_freeze)
 	{
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled). However, we strongly prefer to keep
-		 * them in sync.
-		 *
-		 * The heap page is added to the WAL chain even if it wasn't modified,
-		 * so we still need to mark it dirty. The only scenario where it isn't
-		 * modified in phase I is when the VM was truncated or removed, which
-		 * isn't worth optimizing for.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buffer);
-		presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
-												InvalidXLogRecPtr,
-												vmbuffer, vm_conflict_horizon,
-												presult->new_vmbits);
+		if (presult->nfrozen > 0)
+		{
+			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+		}
+		else
+		{
+			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+		}
 	}
 }
 
-- 
2.43.0



  [application/x-patch] v25-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 7-v25-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 80189d9a76a8a993d390fc3372c1b4d866cc4fb4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v25 06/14] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 29382550c03..b51112a71a7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1886,9 +1886,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1905,13 +1908,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [application/x-patch] v25-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.4K, 8-v25-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 41dfb68868816c178bc2809144cb4fe6cbef8b37 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v25 07/14] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |   6 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 112 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  28 +---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 46 insertions(+), 375 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 225f9829f22..60cc6ba998d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8812,50 +8812,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a09fb4b803a..b66736ea282 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fb82b0c0f86..7c36b89324e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1180,9 +1180,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * in sync.
 			 */
 			PageSetAllVisible(page);
-			old_vmbits = visibilitymap_set_vmbits(blockno,
-												  vmbuffer, new_vmbits,
-												  params->relation->rd_locator);
+			old_vmbits = visibilitymap_set(blockno,
+										   vmbuffer, new_vmbits,
+										   params->relation->rd_locator);
 			Assert(old_vmbits != new_vmbits);
 		}
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b51112a71a7..c18030087c1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1908,11 +1908,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2794,9 +2794,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 * set PD_ALL_VISIBLE.
 		 */
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..69678187832 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBILITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9dd65b10254..1819f3dbb77 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4306,7 +4306,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [application/x-patch] v25-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (8.1K, 9-v25-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From 66f6c73a5ab743db859e5a91790c1148ef2ff3e6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v25 08/14] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.

The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 12 ++++++------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 17 ++++++++---------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7c36b89324e..b7ccef1c084 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -252,7 +252,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -486,7 +486,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
+	 * checked item causes GlobalVisFullXidVisibleToAll() to update the
 	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
 	 * transaction aborts.
 	 *
@@ -1297,11 +1297,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1760,7 +1760,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				/*
 				 * For now always use prstate->cutoffs for this test, because
 				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisTestIsRemovableXid
+				 * is requested. We could use GlobalVisXidVisibleToAll()
 				 * instead, if a non-freezing caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index cb5671c1a4e..3a68757c09a 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index f3a1603204e..67da6737496 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4179,8 +4179,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4214,14 +4213,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 }
 
 /*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
  *
  * It is crucial that this only gets called for xids from a source that
  * protects against xid wraparounds (e.g. from a table and thus protected by
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4235,12 +4234,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
  */
 bool
 GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4249,12 +4248,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4263,7 +4262,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [application/x-patch] v25-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (10.6K, 10-v25-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From ebf2e3ddd0d222991bf089ddc8ac784e43dfa140 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v25 09/14] Use GlobalVisState in vacuum to determine page
 level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++
 src/backend/access/heap/pruneheap.c         | 43 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 10 ++---
 src/include/access/heapam.h                 | 11 +++---
 4 files changed, 58 insertions(+), 34 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b7ccef1c084..a5eab2b41a0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -449,11 +449,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -965,14 +966,13 @@ heap_page_will_set_vm(Relation relation,
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -1058,6 +1058,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prune_freeze_plan(RelationGetRelid(params->relation),
 					  buffer, &prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1225,10 +1235,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
-										prstate.cutoffs->OldestXmin,
+										prstate.vistest,
 										&debug_all_frozen,
 										&debug_cutoff, off_loc));
 
@@ -1757,20 +1766,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisXidVisibleToAll()
-				 * instead, if a non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c18030087c1..10543eca065 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2748,7 +2748,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3508,7 +3508,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3524,7 +3524,7 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3598,7 +3598,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 		/* Visibility checks may do IO or allocate memory */
 		Assert(CritSectionCount == 0);
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3617,7 +3617,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 392af6503da..b6f1b3fb448 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -278,10 +278,9 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Contains the cutoffs used for freezing. They are required if the
-	 * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
-	 * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
-	 * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
-	 * calculates them once, at the beginning of vacuuming the relation.
+	 * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+	 * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+	 * relation.
 	 */
 	struct VacuumCutoffs *cutoffs;
 } PruneFreezeParams;
@@ -445,7 +444,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -459,6 +458,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [application/x-patch] v25-0011-Track-which-relations-are-modified-by-a-query.patch (2.5K, 11-v25-0011-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 15eeb3a01ced0214bb2b189b9e273936b25d523f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v25 11/14] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
---
 src/backend/executor/execMain.c  | 4 ++++
 src/backend/executor/execUtils.c | 2 ++
 src/include/nodes/execnodes.h    | 6 ++++++
 3 files changed, 12 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 797d8b1ca1c..5b299ef81aa 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 64ff6996431..7f6522cea8e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [application/x-patch] v25-0012-Pass-down-information-on-table-modification-to-s.patch (23.0K, 12-v25-0012-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 7d352b125b3239a3a3cc030b6a58d5fcae43c139 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v25 12/14] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  7 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 +++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  4 ++--
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++----
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      | 11 ++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              | 19 ++++++++++-------
 23 files changed, 91 insertions(+), 44 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 26cb75058d1..4ad8941c60a 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index df30dcc0228..aaa5401b731 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..6c2e4e08b16 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 707c25289cd..468830cc0b8 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
 	}
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index d7695dc1108..7bdbc7e5fa7 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 1e099febdc8..db2a302a486 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index dae91630ac3..1957bb0f1a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1c9ef53be20..1c00e053e05 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6345,7 +6345,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13730,7 +13730,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 0eb8e0a2bb0..6319db488fc 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0b3a31f1703..74262a34819 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 860f79f9cc1..6e49ea5c5d8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 6bea42f128f..2c87ba5f767 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 72b135e5dcf..92674441c6d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 40ac700d529..a6626325296 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3361,7 +3361,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c760b19db55..ec0def0d1e2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b6f1b3fb448..480a1bd654f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
 
 	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
 	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d10b1b03cdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the rel */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1164,9 +1167,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [application/x-patch] v25-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch (12.9K, 13-v25-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 63e4ad0f2cba76b874e8915da8a5b92e2ec00fb6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v25 13/14] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 15 ++++-
 src/backend/access/heap/heapam_handler.c      | 15 ++++-
 src/backend/access/heap/pruneheap.c           | 61 ++++++++++++++++---
 src/include/access/heapam.h                   | 24 +++++++-
 .../t/035_standby_logical_decoding.pl         |  3 +-
 5 files changed, 101 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 60cc6ba998d..deb64e19ae8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6c2e4e08b16..2cb98e58956 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2472,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2518,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4fe869fea99..912684ead63 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,7 +205,9 @@ static bool heap_page_will_set_vm(Relation relation,
 								  Buffer heap_buf,
 								  Buffer vmbuffer,
 								  bool blk_known_av,
-								  const PruneState *prstate,
+								  PruneReason reason,
+								  bool do_prune, bool do_freeze,
+								  PruneState *prstate,
 								  uint8 *new_vmbits);
 
 
@@ -220,9 +222,13 @@ static bool heap_page_will_set_vm(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -304,6 +310,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 				.cutoffs = NULL,
 			};
 
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
 
@@ -862,6 +875,9 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm, uint8 new_vmbits
  * corrupted, it will fix them by clearing the VM bits and page visibility
  * hint. This does not need to be done in a critical section.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with returning the
  * desired what bits should be set in the VM in *new_vmbits.
  */
@@ -871,7 +887,9 @@ heap_page_will_set_vm(Relation relation,
 					  Buffer heap_buf,
 					  Buffer vmbuffer,
 					  bool blk_known_av,
-					  const PruneState *prstate,
+					  PruneReason reason,
+					  bool do_prune, bool do_freeze,
+					  PruneState *prstate,
 					  uint8 *new_vmbits)
 {
 	Page		heap_page = BufferGetPage(heap_buf);
@@ -884,6 +902,24 @@ heap_page_will_set_vm(Relation relation,
 		return false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+		return false;
+	}
+
 	/*
 	 * Determine what the visibility map bits should be set to using the
 	 * values of all_visible and all_frozen determined during
@@ -906,14 +942,15 @@ heap_page_will_set_vm(Relation relation,
 	 * WAL-logged.
 	 *
 	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
+	 * page-level bit is clear.  However, it's possible that in vacuum the bit
+	 * got cleared after heap_vac_scan_next_block() was called, so we must
+	 * recheck with buffer lock before concluding that the VM is corrupt.
 	 *
 	 * Callers which did not check the visibility map and determine
 	 * blk_known_av will not be eligible for this, however the cost of
 	 * potentially needing to read the visibility map for pages that are not
-	 * all-visible is too high to justify generalizing the check.
+	 * all-visible is too high to justify generalizing the check. A future
+	 * vacuum will have to take care of fixing the corruption.
 	 */
 	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
 			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -1122,6 +1159,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	do_set_vm = heap_page_will_set_vm(params->relation,
 									  blockno, buffer, vmbuffer, params->blk_known_av,
+									  params->reason, do_prune, do_freeze,
 									  &prstate, &new_vmbits);
 
 	/*
@@ -1193,15 +1231,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			old_vmbits = visibilitymap_set(blockno,
 										   vmbuffer, new_vmbits,
 										   params->relation->rd_locator);
-			Assert(old_vmbits != new_vmbits);
 		}
 
 		MarkBufferDirty(buffer);
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
+		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+		 * only planning to update the VM, and it turns out that it was
+		 * already set, there is no need to emit WAL. As such, we must check
+		 * again that there is some change to emit WAL for.
 		 */
-		if (RelationNeedsWAL(params->relation))
+		if (RelationNeedsWAL(params->relation) &&
+			(do_prune || do_freeze || (do_set_vm && old_vmbits != new_vmbits)))
 		{
 			log_heap_prune_and_freeze(params->relation, buffer,
 									  do_set_vm ? vmbuffer : InvalidBuffer,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 480a1bd654f..89538652566 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 
 	/*
 	 * Some optimizations can only be performed if the query does not modify
@@ -425,7 +442,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v25-0014-Set-pd_prune_xid-on-insert.patch (6.7K, 14-v25-0014-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 60bcb22b7f964fdfeffd285ca5dae84d663987e5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v25 14/14] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../modules/index/expected/killtuples.out     |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index deb64e19ae8..ee95df919c7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index b66736ea282..5c8dc2718ce 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-11 04:06  Chao Li <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Chao Li @ 2025-12-11 04:06 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>



> On Dec 11, 2025, at 07:35, Melanie Plageman <[email protected]> wrote:
> 
> On Tue, Dec 9, 2025 at 12:48 PM Melanie Plageman
> <[email protected]> wrote:
>> 
>> In this set 0001 and 0002 are independent. 0003-0007 are all small
>> steps toward the single change in 0007 which combines the VM updates
>> into the same WAL record as pruning and freezing. 0008 and 0009 are
>> removing the rest of XLOG_HEAP2_VISIBLE. 0010 - 0012 are refactoring
>> needed to set the VM during on-access pruning. 0013 - 0015 are small
>> steps toward setting the VM on-access. And 0016 sets the prune xid on
>> insert so we may set the VM on-access for pages that have only new
>> data.
> 
> I committed 0001 and 0002. attached v25 reflects that.
> 0001-0004 refactoring steps for eliminate visible record from phase I
> (not probably independent commits in the end)
> 0005 eliminate XLOG_HEAP2_VISIBLE from phase I vac
> 0006-0007 removing the rest of XLOG_HEAP2_VISIBLE
> 0008-0010 refactoring for setting VM on-access
> 0011-0013 setting the VM on-access
> 0014 - setting pd_prune_xid on insert
> 
> - Melanie
> <v25-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch><v25-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patch><v25-0003-Set-the-VM-in-heap_page_prune_and_freeze.patch><v25-0004-Move-VM-assert-into-prune-freeze-code.patch><v25-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v25-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v25-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v25-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch><v25-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v25-0011-Track-which-relations-are-modified-by-a-query.patch><v25-0012-Pass-down-information-on-table-modification-to-s.patch><v25-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch><v25-0014-Set-pd_prune_xid-on-insert.patch>

A few more small comments. Sorry for keeping come out new comments. Actually I learned a lot about vacuum from reviewing this patch.

1 - 0001
```
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
+-- modification
+vacuum test_vac_unmodified_heap;
```

The last vacuum is expected to set vm bits, but the test doesn’t verify that. Should we verify that like:
```
evantest=# SELECT blkno, all_visible, all_frozen FROM pg_visibility_map('test_vac_unmodified_heap');
 blkno | all_visible | all_frozen
-------+-------------+------------
     0 | t           | t
(1 row)
```

As you have been using the extension pg_visibility, adding the verification with pg_visibility_map() should not be a burden.

2 - 0001
```
 		if (presult.all_frozen)
 		{
+			/*
+			 * We can pass InvalidTransactionId as our cutoff_xid, since a
+			 * snapshotConflictHorizon sufficient to make everything safe for
+			 * REDO was logged when the page's tuples were frozen.
+			 */
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
+			new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 		}
```

The comment here is a little confusing. In the old code, the Assert() as immediately above the call visibilitymap_set(), and cutoff_xid is a parameter to the call. But the new code moves the Assert() as well as the comment far away from the call visibilitymap_set(), so I think the comment should stay together with the call of visibilitymap_set().

3 - 0002
```
 * If it finds that the page-level visibility hint or VM is corrupted, it will
* fix them by clearing the VM bits and visibility page hint. This does not
```

In the second line, “visibility page hint” is understandable but feels not quite good. I know it’s actually “page-level visibility hint”, so how about just “visibility hint”.

4 - 0002
```
 	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
```

Without do_set_vm==true, old_vmbits will only be 0, thus this “if-elseif” that uses old_vmbits should be moved into “if (do_set_vm)”. From this perspective, if not do_set_vm, this function can return early, like:

```
Do_set_vm = heap_page_will_set_vm(&new_vmbits)
If (!do_set_vm)
   Return presult.ndeleted;

PageSetAllVisible(page);
MarkBufferDirty(buf);
old_vmbits = visibilitymap_set(new_vmbits);
If (old_vmbits..)
{
..
}
Else if (old_vmbits…)
{
…
}

Return presult.ndeleted;
```

5 - 0003
```
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2076,15 +1979,14 @@ lazy_scan_prune(LVRelState *vacrel,
 				bool *vm_page_frozen)
 {
 	Relation	rel = vacrel->rel;
-	bool		do_set_vm = false;
-	uint8		new_vmbits = 0;
-	uint8		old_vmbits = 0;
 	PruneFreezeResult presult;
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
+		.blk_known_av = all_visible_according_to_vm,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
```

This maybe a legacy bug. Here presult is not initialized, and it is immediately passed to heap_page_prune_and_freeze():

```
	heap_page_prune_and_freeze(&params,
							   &presult, <=== here
							   &vacrel->offnum,
							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
```

Then heap_page_prune_and_freeze() immediately calls prune_freeze_setup():
```
	/* Initialize prstate */
	prune_freeze_setup(params,
					   new_relfrozen_xid, new_relmin_mxid,
					   presult, &prstate);
```

And prune_freeze_setup() takes presult as a const pointer:
```
static void
prune_freeze_setup(PruneFreezeParams *params,
				   TransactionId *new_relfrozen_xid,
				   MultiXactId *new_relmin_mxid,
				   const PruneFreezeResult *presult, <=== here
				   PruneState *prstate)
{
    prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets; <== here, presult->deadoffsets could be a random value
}
```

As this is a separate issue off the current patch, I just filed a new patch to fix it. Please take a look at:
https://www.postgresql.org/message-id/CAEoWx2%3DjiD1nqch4JQN%2BodAxZSD7mRvdoHUGJYN2r6tQG_66yQ%40mail...

6 - 0003
```
+ * Returns true if one or both VM bits should be set, along with returning the
+ * desired what bits should be set in the VM in *new_vmbits.
```

Looks like a typo: “returning the desired what bits should be set”, maybe change to “returning the desired bits to be set”.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-13 13:59  Peter Eisentraut <[email protected]>
  parent: Melanie Plageman <[email protected]>
  2 siblings, 1 reply; 143+ messages in thread

From: Peter Eisentraut @ 2025-12-13 13:59 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Kirill Reshke <[email protected]>

On 20.11.25 18:19, Melanie Plageman wrote:
> +	prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;

In your patch 
v22-0001-Split-heap_page_prune_and_freeze-into-helpers.patch, the 
assignment above casts away the const qualification of the function 
argument presult:

+static void
+prune_freeze_setup(PruneFreezeParams *params,
+				   TransactionId new_relfrozen_xid,
+				   MultiXactId new_relmin_mxid,
+				   const PruneFreezeResult *presult,
+				   PruneState *prstate)

(The cast is otherwise unnecessary, since the underlying type is the 
same on both sides.)

Since prstate->deadoffsets is in fact later modified, this makes the 
original const qualification invalid.

I suggest the attached patch to remove the faulty const qualification 
and the then-unnecessary cast.

From 336aa87add1a85aca84d8ca751c4187a08aa9d7f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <[email protected]>
Date: Sat, 13 Dec 2025 14:45:08 +0100
Subject: [PATCH] Fix const qualification in prune_freeze_setup()

The const qualification of the presult argument is later cast away, so
it was not correct.
---
 src/backend/access/heap/pruneheap.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ca44225a10e..4eb49380b92 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -160,7 +160,7 @@ typedef struct
 static void prune_freeze_setup(PruneFreezeParams *params,
 							   TransactionId *new_relfrozen_xid,
 							   MultiXactId *new_relmin_mxid,
-							   const PruneFreezeResult *presult,
+							   PruneFreezeResult *presult,
 							   PruneState *prstate);
 static void prune_freeze_plan(Oid reloid, Buffer buffer,
 							  PruneState *prstate,
@@ -327,7 +327,7 @@ static void
 prune_freeze_setup(PruneFreezeParams *params,
 				   TransactionId *new_relfrozen_xid,
 				   MultiXactId *new_relmin_mxid,
-				   const PruneFreezeResult *presult,
+				   PruneFreezeResult *presult,
 				   PruneState *prstate)
 {
 	/* Copy parameters to prstate */
@@ -382,7 +382,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->recently_dead_tuples = 0;
 	prstate->hastup = false;
 	prstate->lpdead_items = 0;
-	prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
+	prstate->deadoffsets = presult->deadoffsets;
 	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-- 
2.52.0



Attachments:

  [text/plain] 0001-Fix-const-qualification-in-prune_freeze_setup.patch.nocfbot (1.6K, 2-0001-Fix-const-qualification-in-prune_freeze_setup.patch.nocfbot)
  download | inline diff:
From 336aa87add1a85aca84d8ca751c4187a08aa9d7f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <[email protected]>
Date: Sat, 13 Dec 2025 14:45:08 +0100
Subject: [PATCH] Fix const qualification in prune_freeze_setup()

The const qualification of the presult argument is later cast away, so
it was not correct.
---
 src/backend/access/heap/pruneheap.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ca44225a10e..4eb49380b92 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -160,7 +160,7 @@ typedef struct
 static void prune_freeze_setup(PruneFreezeParams *params,
 							   TransactionId *new_relfrozen_xid,
 							   MultiXactId *new_relmin_mxid,
-							   const PruneFreezeResult *presult,
+							   PruneFreezeResult *presult,
 							   PruneState *prstate);
 static void prune_freeze_plan(Oid reloid, Buffer buffer,
 							  PruneState *prstate,
@@ -327,7 +327,7 @@ static void
 prune_freeze_setup(PruneFreezeParams *params,
 				   TransactionId *new_relfrozen_xid,
 				   MultiXactId *new_relmin_mxid,
-				   const PruneFreezeResult *presult,
+				   PruneFreezeResult *presult,
 				   PruneState *prstate)
 {
 	/* Copy parameters to prstate */
@@ -382,7 +382,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->recently_dead_tuples = 0;
 	prstate->hastup = false;
 	prstate->lpdead_items = 0;
-	prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
+	prstate->deadoffsets = presult->deadoffsets;
 	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-- 
2.52.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-15 21:05  Melanie Plageman <[email protected]>
  parent: Peter Eisentraut <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-12-15 21:05 UTC (permalink / raw)
  To: Peter Eisentraut <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Kirill Reshke <[email protected]>

On Sat, Dec 13, 2025 at 8:59 AM Peter Eisentraut <[email protected]> wrote:
>
> On 20.11.25 18:19, Melanie Plageman wrote:
> > +     prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
>
> In your patch
> v22-0001-Split-heap_page_prune_and_freeze-into-helpers.patch, the
> assignment above casts away the const qualification of the function
> argument presult:

Yea, this code (prune_freeze_setup() with a const-qualified
PruneFreezeResult parameter) is actually already in master -- not just
in this patchset.

> +static void
> +prune_freeze_setup(PruneFreezeParams *params,
> +                                  TransactionId new_relfrozen_xid,
> +                                  MultiXactId new_relmin_mxid,
> +                                  const PruneFreezeResult *presult,
> +                                  PruneState *prstate)
>
> (The cast is otherwise unnecessary, since the underlying type is the
> same on both sides.)
>
> Since prstate->deadoffsets is in fact later modified, this makes the
> original const qualification invalid.

I didn't realize I was misusing const here. What I meant to indicate
by defining the prune_freeze_setup() parameter, as const, is that the
PruneFreezeResult wouldn't be modified by prune_freeze_setup(). I did
not mean to indicate that no members of PruneFreezeResult would ever
be modified. deadoffsets is not modified in prune_freeze_setup(). So,
are you saying that I can't define a parameter as const if even the
caller modifies it?

I'm fine with committing a change, I just want to understand.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-15 21:29  Melanie Plageman <[email protected]>
  parent: Chao Li <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-12-15 21:29 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hi,

Attached v26 includes a new patch, 0002, which gets rid of
all_visible_according_to_vm in lazy_scan_prune(). We've kept this
cached copy of the all-visible bit since the VM was added way back in
608195a3a365. Back then, the VM wasn't pinned unless
all_visible_according_to_vm was false. Now that we unconditionally
have the VM page pinned, there isn't much performance benefit to using
that cached value. I did some testing of the worst possible case and
saw no difference in timing. By removing that, we simplify heap vacuum
code now.  And we improve clarity once the VM update is combined into
the prune/freeze WAL record and when the VM is set on-access.

I think 0001 and 0002 (and maybe 0003) are worthwhile clarity
improvements on their own.

On Wed, Dec 10, 2025 at 11:07 PM Chao Li <[email protected]> wrote:
>
> A few more small comments. Sorry for keeping come out new comments. Actually I learned a lot about vacuum from reviewing this patch.

Thanks for the continued review. Your feedback is improving the patchset.

> The last vacuum is expected to set vm bits, but the test doesn’t verify that. Should we verify that like:
> ```
> evantest=# SELECT blkno, all_visible, all_frozen FROM pg_visibility_map('test_vac_unmodified_heap');
>  blkno | all_visible | all_frozen
> -------+-------------+------------
>      0 | t           | t
> (1 row)

I've done this. I've actually added three such verifications -- one
after each step where the VM is expected to change. It shouldn't be
very expensive, so I think it is okay. The way the test would fail if
the buffer wasn't correctly dirtied is that it would assert out -- so
the visibility map test wouldn't even have a chance to fail. But, I
think it is also okay to confirm that the expected things are
happening with the VM -- it just gives us extra coverage.

>                 if (presult.all_frozen)
>                 {
> +                       /*
> +                        * We can pass InvalidTransactionId as our cutoff_xid, since a
> +                        * snapshotConflictHorizon sufficient to make everything safe for
> +                        * REDO was logged when the page's tuples were frozen.
> +                        */
>                         Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
> -                       flags |= VISIBILITYMAP_ALL_FROZEN;
> +                       new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
>                 }
>
> The comment here is a little confusing. In the old code, the Assert() as immediately above the call visibilitymap_set(), and cutoff_xid is a parameter to the call. But the new code moves the Assert() as well as the comment far away from the call visibilitymap_set(), so I think the comment should stay together with the call of visibilitymap_set().

Good point. I've moved it closer to visibilitymap_set() and modified
and moved the assert so that it is together with the comment. I think
the comment makes little sense without the assertion.

>  * If it finds that the page-level visibility hint or VM is corrupted, it will
> * fix them by clearing the VM bits and visibility page hint. This does not
>
> In the second line, “visibility page hint” is understandable but feels not quite good. I know it’s actually “page-level visibility hint”, so how about just “visibility hint”.

I've changed this.

>         /*
> -        * As of PostgreSQL 9.2, the visibility map bit should never be set if the
> -        * page-level bit is clear.  However, it's possible that the bit got
> -        * cleared after heap_vac_scan_next_block() was called, so we must recheck
> -        * with buffer lock before concluding that the VM is corrupt.
> +        * For the purposes of logging, count whether or not the page was newly
> +        * set all-visible and, potentially, all-frozen.
>          */
> -       else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
> -                        visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
> +       if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
> +               (new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
>         {
> ```
>
> Without do_set_vm==true, old_vmbits will only be 0, thus this “if-elseif” that uses old_vmbits should be moved into “if (do_set_vm)”. From this perspective, if not do_set_vm, this function can return early, like:

Good point. I've actually gone ahead in 0002 and refactored this whole
section a bit (I got rid of all_visible_according_to_vm). 0002 is a
new patch in this attached v26, and it needs review. I think this
refactoring makes the code quite a bit clearer -- especially once we
start setting the VM on-access. It does, amongst other things, return
early if all_visible is false, like you suggested.

> + * Returns true if one or both VM bits should be set, along with returning the
> + * desired what bits should be set in the VM in *new_vmbits.
> ```
>
> Looks like a typo: “returning the desired what bits should be set”, maybe change to “returning the desired bits to be set”.

Fixed.

- Melanie


Attachments:

  [text/x-patch] v26-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch (9.6K, 2-v26-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch)
  download | inline diff:
From 0749b6a9978f6e74af89d91b8beddf0fa1c7ed03 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v26 01/15] Combine visibilitymap_set() cases in
 lazy_scan_prune()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).

In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().

Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would fail
anyway.

Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.

This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
 .../pg_visibility/expected/pg_visibility.out  | 35 +++++++
 contrib/pg_visibility/sql/pg_visibility.sql   | 16 ++++
 src/backend/access/heap/vacuumlazy.c          | 95 +++++--------------
 3 files changed, 73 insertions(+), 73 deletions(-)

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..cbc04aad016 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -204,6 +204,41 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (1,1)
+(1 row)
+
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map 
+----------------------------
+ 
+(1 row)
+
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (0,0)
+(1 row)
+
+-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
+-- modification
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (1,1)
+(1 row)
+
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 -- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..0d13116248b 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,22 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
+-- modification
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 62035b7f9c3..811e7e33678 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2094,16 +2094,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
 	 * all_frozen variables
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.all_visible && !all_visible_according_to_vm) ||
+		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
 	{
 		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
+		uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
+			new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
 		/*
 		 * It should never be the case that the visibility map page is set
@@ -2111,19 +2109,29 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+		 * unnecessarily dirtying the heap buffer, as it must be marked dirty
+		 * before adding it to the WAL chain. The only scenario where it is
+		 * not already dirty is if the VM was removed, and that isn't worth
+		 * optimizing for.
 		 */
 		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
+
+		/*
+		 * If the page is being set all-frozen, we pass InvalidTransactionId
+		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+		 * make everything safe for REDO was logged when the page's tuples
+		 * were frozen.
+		 */
+		Assert(!presult.all_frozen ||
+			   !TransactionIdIsValid(presult.vm_conflict_horizon));
+
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
+									   vmbuffer,
+									   presult.vm_conflict_horizon,
+									   new_vmbits);
 
 		/*
 		 * If the page wasn't already set all-visible and/or all-frozen in the
@@ -2191,65 +2199,6 @@ lazy_scan_prune(LVRelState *vacrel,
 							VISIBILITYMAP_VALID_BITS);
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v26-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch (13.3K, 3-v26-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch)
  download | inline diff:
From ee1ff0f2e7322f0e034083d37ceab3d8a8ff374d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 11 Dec 2025 10:48:13 -0500
Subject: [PATCH v26 02/15] Eliminate use of cached VM value in
 lazy_scan_prune()

lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a365, when we did not pin the
VM page until confirming it was not all-visible. Now that the VM page is
already pinned, there is no meaningful benefit to relying on a cached VM
status.

Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available which would make the logic harder to reason about. Eliminating
it also enables us to detect and repair VM corruption on-access.

Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible that after
fixing corruption, the VM could be newly set, if pruning found the page
all-visible.

Author: Melanie Plageman <[email protected]>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 172 ++++++++++++---------------
 1 file changed, 79 insertions(+), 93 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 811e7e33678..436143cd12c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -253,7 +253,6 @@ typedef enum
  * about the block it read to the caller.
  */
 #define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
-#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
 
 typedef struct LVRelState
 {
@@ -358,7 +357,6 @@ typedef struct LVRelState
 	/* State maintained by heap_vac_scan_next_block() */
 	BlockNumber current_block;	/* last block returned */
 	BlockNumber next_unskippable_block; /* next unskippable block */
-	bool		next_unskippable_allvis;	/* its visibility status */
 	bool		next_unskippable_eager_scanned; /* if it was eagerly scanned */
 	Buffer		next_unskippable_vmbuffer;	/* buffer containing its VM bit */
 
@@ -432,7 +430,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							Buffer vmbuffer, bool all_visible_according_to_vm,
+							Buffer vmbuffer,
 							bool *has_lpdead_items, bool *vm_page_frozen);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
@@ -1249,7 +1247,6 @@ lazy_scan_heap(LVRelState *vacrel)
 	/* Initialize for the first heap_vac_scan_next_block() call */
 	vacrel->current_block = InvalidBlockNumber;
 	vacrel->next_unskippable_block = InvalidBlockNumber;
-	vacrel->next_unskippable_allvis = false;
 	vacrel->next_unskippable_eager_scanned = false;
 	vacrel->next_unskippable_vmbuffer = InvalidBuffer;
 
@@ -1265,13 +1262,13 @@ lazy_scan_heap(LVRelState *vacrel)
 										MAIN_FORKNUM,
 										heap_vac_scan_next_block,
 										vacrel,
-										sizeof(uint8));
+										sizeof(bool));
 
 	while (true)
 	{
 		Buffer		buf;
 		Page		page;
-		uint8		blk_info = 0;
+		bool		was_eager_scanned = false;
 		int			ndeleted = 0;
 		bool		has_lpdead_items;
 		void	   *per_buffer_data = NULL;
@@ -1340,13 +1337,13 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (!BufferIsValid(buf))
 			break;
 
-		blk_info = *((uint8 *) per_buffer_data);
+		was_eager_scanned = *((bool *) per_buffer_data);
 		CheckBufferIsPinnedOnce(buf);
 		page = BufferGetPage(buf);
 		blkno = BufferGetBlockNumber(buf);
 
 		vacrel->scanned_pages++;
-		if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
+		if (was_eager_scanned)
 			vacrel->eager_scanned_pages++;
 
 		/* Report as block scanned, update error traceback information */
@@ -1417,7 +1414,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (got_cleanup_lock)
 			ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
 									   vmbuffer,
-									   blk_info & VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM,
 									   &has_lpdead_items, &vm_page_frozen);
 
 		/*
@@ -1434,8 +1430,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * exclude pages skipped due to cleanup lock contention from eager
 		 * freeze algorithm caps.
 		 */
-		if (got_cleanup_lock &&
-			(blk_info & VAC_BLK_WAS_EAGER_SCANNED))
+		if (got_cleanup_lock && was_eager_scanned)
 		{
 			/* Aggressive vacuums do not eager scan. */
 			Assert(!vacrel->aggressive);
@@ -1602,7 +1597,6 @@ heap_vac_scan_next_block(ReadStream *stream,
 {
 	BlockNumber next_block;
 	LVRelState *vacrel = callback_private_data;
-	uint8		blk_info = 0;
 
 	/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
 	next_block = vacrel->current_block + 1;
@@ -1665,8 +1659,8 @@ heap_vac_scan_next_block(ReadStream *stream,
 		 * otherwise they would've been unskippable.
 		 */
 		vacrel->current_block = next_block;
-		blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
-		*((uint8 *) per_buffer_data) = blk_info;
+		/* Block was not eager scanned */
+		*((bool *) per_buffer_data) = false;
 		return vacrel->current_block;
 	}
 	else
@@ -1678,11 +1672,7 @@ heap_vac_scan_next_block(ReadStream *stream,
 		Assert(next_block == vacrel->next_unskippable_block);
 
 		vacrel->current_block = next_block;
-		if (vacrel->next_unskippable_allvis)
-			blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
-		if (vacrel->next_unskippable_eager_scanned)
-			blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
-		*((uint8 *) per_buffer_data) = blk_info;
+		*((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
 		return vacrel->current_block;
 	}
 }
@@ -1707,7 +1697,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 	BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
 	Buffer		next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
 	bool		next_unskippable_eager_scanned = false;
-	bool		next_unskippable_allvis;
 
 	*skipsallvis = false;
 
@@ -1717,7 +1706,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 													   next_unskippable_block,
 													   &next_unskippable_vmbuffer);
 
-		next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
 
 		/*
 		 * At the start of each eager scan region, normal vacuums with eager
@@ -1736,7 +1724,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 		 * A block is unskippable if it is not all visible according to the
 		 * visibility map.
 		 */
-		if (!next_unskippable_allvis)
+		if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
 		{
 			Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
 			break;
@@ -1793,7 +1781,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 
 	/* write the local variables back to vacrel */
 	vacrel->next_unskippable_block = next_unskippable_block;
-	vacrel->next_unskippable_allvis = next_unskippable_allvis;
 	vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
 	vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
 }
@@ -1954,9 +1941,7 @@ cmpOffsetNumbers(const void *a, const void *b)
  * Caller must hold pin and buffer cleanup lock on the buffer.
  *
  * vmbuffer is the buffer containing the VM block with visibility information
- * for the heap block, blkno. all_visible_according_to_vm is the saved
- * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * for the heap block, blkno.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1973,7 +1958,6 @@ lazy_scan_prune(LVRelState *vacrel,
 				BlockNumber blkno,
 				Page page,
 				Buffer vmbuffer,
-				bool all_visible_according_to_vm,
 				bool *has_lpdead_items,
 				bool *vm_page_frozen)
 {
@@ -1987,6 +1971,8 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
+	uint8		old_vmbits = 0;
+	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2089,70 +2075,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 	Assert(!presult.all_frozen || presult.all_visible);
 
-	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
-	 */
-	if ((presult.all_visible && !all_visible_according_to_vm) ||
-		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
-	{
-		uint8		old_vmbits;
-		uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-			new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
-		 * unnecessarily dirtying the heap buffer, as it must be marked dirty
-		 * before adding it to the WAL chain. The only scenario where it is
-		 * not already dirty is if the VM was removed, and that isn't worth
-		 * optimizing for.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-
-		/*
-		 * If the page is being set all-frozen, we pass InvalidTransactionId
-		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
-		 * make everything safe for REDO was logged when the page's tuples
-		 * were frozen.
-		 */
-		Assert(!presult.all_frozen ||
-			   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer,
-									   presult.vm_conflict_horizon,
-									   new_vmbits);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
+	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
 
 	/*
 	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
@@ -2160,8 +2083,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
 	 * with buffer lock before concluding that the VM is corrupt.
 	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	if (!PageIsAllVisible(page) &&
+		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
 		ereport(WARNING,
 				(errcode(ERRCODE_DATA_CORRUPTED),
@@ -2199,6 +2122,69 @@ lazy_scan_prune(LVRelState *vacrel,
 							VISIBILITYMAP_VALID_BITS);
 	}
 
+	if (!presult.all_visible)
+		return presult.ndeleted;
+
+	/* Set the visibility map and page visibility hint */
+	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (presult.all_frozen)
+		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	/* Nothing to do */
+	if (old_vmbits == new_vmbits)
+		return presult.ndeleted;
+
+	Assert(presult.all_visible);
+
+	/*
+	 * It should never be the case that the visibility map page is set while
+	 * the page-level bit is clear, but the reverse is allowed (if checksums
+	 * are not enabled).  Regardless, set both bits so that we get back in
+	 * sync.
+	 *
+	 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+	 * unnecessarily dirtying the heap buffer, as it must be marked dirty
+	 * before adding it to the WAL chain. The only scenario where it is not
+	 * already dirty is if the VM was removed, and that isn't worth optimizing
+	 * for.
+	 */
+	PageSetAllVisible(page);
+	MarkBufferDirty(buf);
+
+	/*
+	 * If the page is being set all-frozen, we pass InvalidTransactionId as
+	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
+	 * everything safe for REDO was logged when the page's tuples were frozen.
+	 */
+	Assert(!presult.all_frozen ||
+		   !TransactionIdIsValid(presult.vm_conflict_horizon));
+
+	visibilitymap_set(vacrel->rel, blkno, buf,
+					  InvalidXLogRecPtr,
+					  vmbuffer, presult.vm_conflict_horizon,
+					  new_vmbits);
+
+	/*
+	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
+	 * count it as newly set for logging.
+	 */
+	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+	{
+		vacrel->vm_new_visible_pages++;
+		if (presult.all_frozen)
+		{
+			vacrel->vm_new_visible_frozen_pages++;
+			*vm_page_frozen = true;
+		}
+	}
+	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 presult.all_frozen)
+	{
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
+	}
+
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v26-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch (6.5K, 4-v26-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch)
  download | inline diff:
From ad1ddde609491f606adc4b87429c380e3b86ad52 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v26 03/15] Refactor lazy_scan_prune() VM clear logic into
 helper

Encapsulating them in a helper makes the whole function clearer. Before
we move all of this logic into heap_page_prune_and_freeze(), we want to
make it more compact and clear.
---
 src/backend/access/heap/vacuumlazy.c | 122 +++++++++++++++++----------
 1 file changed, 78 insertions(+), 44 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 436143cd12c..425dc2f8691 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -428,6 +428,11 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+										   BlockNumber heap_blk, Page heap_page,
+										   int nlpdead_items,
+										   Buffer vmbuffer,
+										   uint8 vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1935,6 +1940,77 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ */
+static void
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+							   BlockNumber heap_blk, Page heap_page,
+							   int nlpdead_items,
+							   Buffer vmbuffer,
+							   uint8 vmbits)
+{
+	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (!PageIsAllVisible(heap_page) &&
+		((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+}
+
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2077,50 +2153,8 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(page) &&
-		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
+	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+								   presult.lpdead_items, vmbuffer, old_vmbits);
 
 	if (!presult.all_visible)
 		return presult.ndeleted;
-- 
2.43.0



  [text/x-patch] v26-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch (26.2K, 5-v26-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch)
  download | inline diff:
From 0c36849792d268443c4de300e86008f7cd1adefa Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v26 04/15] Set the VM in heap_page_prune_and_freeze()

This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
---
 src/backend/access/heap/pruneheap.c  | 301 +++++++++++++++++++++++----
 src/backend/access/heap/vacuumlazy.c | 142 +------------
 src/include/access/heapam.h          |  21 ++
 3 files changed, 285 insertions(+), 179 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ca44225a10e..0d825228b62 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+										   BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+										   Buffer vmbuffer,
+										   uint8 vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+								  Relation relation,
+								  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+								  Buffer vmbuffer,
+								  int nlpdead_items,
+								  uint8 *old_vmbits,
+								  uint8 *new_vmbits);
 
 
 /*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeParams params = {
 				.relation = relation,
 				.buffer = buffer,
+				.vmbuffer = InvalidBuffer,
 				.reason = PRUNE_ON_ACCESS,
 				.options = 0,
 				.vistest = vistest,
@@ -338,6 +353,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 
 	/*
@@ -386,51 +403,54 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
-	 * that after scanning the line pointers. We must correct all_visible and
-	 * all_frozen before we return them to the caller, so that the caller
-	 * doesn't set the VM bits incorrectly.
+	 * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+	 * them after deciding whether to freeze, but before updating the VM, to
+	 * avoid setting the VM bits incorrectly.
+	 *
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate->attempt_freeze)
 	{
 		prstate->all_visible = true;
 		prstate->all_frozen = true;
 	}
+	else if (prstate->attempt_update_vm)
+	{
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate->all_visible = false;
 		prstate->all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -765,10 +785,134 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ */
+static void
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+							   BlockNumber heap_blk, Page heap_page,
+							   int nlpdead_items,
+							   Buffer vmbuffer,
+							   uint8 vmbits)
+{
+	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (!PageIsAllVisible(heap_page) &&
+		((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+					  Relation relation,
+					  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+					  Buffer vmbuffer,
+					  int nlpdead_items,
+					  uint8 *old_vmbits,
+					  uint8 *new_vmbits)
+{
+	if (!prstate->attempt_update_vm)
+		return false;
+
+	*old_vmbits = visibilitymap_get_status(relation, heap_blk,
+										   &vmbuffer);
+
+	/* We do this even if not all-visible */
+	identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+								   nlpdead_items, vmbuffer,
+								   *old_vmbits);
+
+	if (!prstate->all_visible)
+		return false;
+
+	*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->all_frozen)
+		*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (*new_vmbits == *old_vmbits)
+	{
+		*new_vmbits = 0;
+		return false;
+	}
+
+	return true;
+}
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -783,12 +927,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -813,13 +958,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
+
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1001,6 +1152,64 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
 		}
 	}
+
+	/* Now update the visibility map and PD_ALL_VISIBLE hint */
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate,
+									  params->relation,
+									  blockno,
+									  buffer,
+									  page,
+									  vmbuffer,
+									  prstate.lpdead_items,
+									  &old_vmbits,
+									  &new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	/* Set the visibility map and page visibility hint, if relevant */
+	if (do_set_vm)
+	{
+		Assert(prstate.all_visible);
+
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed (if
+		 * checksums are not enabled).  Regardless, set both bits so that we
+		 * get back in sync.
+		 *
+		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+		 * unnecessarily dirtying the heap buffer, as it must be marked dirty
+		 * before adding it to the WAL chain. The only scenario where it is
+		 * not already dirty is if the VM was removed, and that isn't worth
+		 * optimizing for.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buffer);
+
+		/*
+		 * If the page is being set all-frozen, we pass InvalidTransactionId
+		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+		 * make everything safe for REDO was logged when the page's tuples
+		 * were frozen.
+		 */
+		Assert(!prstate.all_frozen ||
+			   !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+		visibilitymap_set(params->relation, blockno, buffer,
+						  InvalidXLogRecPtr,
+						  vmbuffer, presult->vm_conflict_horizon,
+						  new_vmbits);
+	}
+
+	/* Save the vmbits for caller */
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = new_vmbits;
 }
 
 
@@ -1475,6 +1684,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			{
 				TransactionId xmin;
 
+				Assert(prstate->attempt_update_vm);
+
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 425dc2f8691..ccfad5b2dba 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -428,11 +428,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1940,77 +1935,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on an heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2042,13 +1966,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2148,73 +2071,24 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer, old_vmbits);
-
-	if (!presult.all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
-	if (presult.all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	Assert(presult.all_visible);
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear, but the reverse is allowed (if checksums
-	 * are not enabled).  Regardless, set both bits so that we get back in
-	 * sync.
-	 *
-	 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
-	 * unnecessarily dirtying the heap buffer, as it must be marked dirty
-	 * before adding it to the WAL chain. The only scenario where it is not
-	 * already dirty is if the VM was removed, and that isn't worth optimizing
-	 * for.
-	 */
-	PageSetAllVisible(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
 	/*
 	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
 	 * count it as newly set for logging.
 	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
 		vacrel->vm_new_visible_pages++;
-		if (presult.all_frozen)
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
 			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.all_frozen)
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
 		vacrel->vm_new_frozen_pages++;
 		*vm_page_frozen = true;
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f7e4ae3843c..f3fa61c9c1b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * vmbuffer is the buffer that must already contain the required block of
+	 * the visibility map if we are to update it.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
 	 *
 	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
 	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 *
+	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+	 * in the VM.
 	 */
 	int			options;
 
@@ -299,6 +309,17 @@ typedef struct PruneFreezeResult
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
+	 *
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
+	 * we have attempted to update the VM.
+	 */
+	uint8		new_vmbits;
+	uint8		old_vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v26-0005-Move-VM-assert-into-prune-freeze-code.patch (10.9K, 6-v26-0005-Move-VM-assert-into-prune-freeze-code.patch)
  download | inline diff:
From 06e17918d799a4b654eccd76e1a39b2bd49e505b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v26 05/15] Move VM assert into prune/freeze code

This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
---
 src/backend/access/heap/pruneheap.c  | 86 ++++++++++++++++++++++------
 src/backend/access/heap/vacuumlazy.c | 68 +---------------------
 src/include/access/heapam.h          | 25 +++-----
 3 files changed, 77 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0d825228b62..ab567e7518b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -908,6 +908,31 @@ heap_page_will_set_vm(PruneState *prstate,
 	return true;
 }
 
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+#endif
+
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -961,6 +986,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -1119,23 +1145,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
@@ -1153,6 +1164,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		}
 	}
 
+	/*
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so we don't need to again.
+	 */
+	if (prstate.all_frozen)
+		vm_conflict_horizon = InvalidTransactionId;
+	else
+		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(presult->lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(params->relation, buffer,
+										prstate.cutoffs->OldestXmin,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == vm_conflict_horizon);
+	}
+#endif
+
 	/* Now update the visibility map and PD_ALL_VISIBLE hint */
 	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
@@ -1198,12 +1249,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * make everything safe for REDO was logged when the page's tuples
 		 * were frozen.
 		 */
-		Assert(!prstate.all_frozen ||
-			   !TransactionIdIsValid(presult->vm_conflict_horizon));
+		Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
 
 		visibilitymap_set(params->relation, blockno, buffer,
 						  InvalidXLogRecPtr,
-						  vmbuffer, presult->vm_conflict_horizon,
+						  vmbuffer, vm_conflict_horizon,
 						  new_vmbits);
 	}
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ccfad5b2dba..3fa03470722 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -462,20 +462,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
-										   OffsetNumber *deadoffsets,
-										   int ndeadoffsets,
-										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
-										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2012,32 +1998,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -3494,29 +3454,6 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
-{
-
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
-										  NULL, 0,
-										  all_frozen,
-										  visibility_cutoff_xid,
-										  logging_offnum);
-}
-#endif
 
 /*
  * Check whether the heap page in buf is all-visible except for the dead
@@ -3540,15 +3477,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
  * This logic is closely related to heap_prune_record_unchanged_lp_normal().
  * If you modify this function, ensure consistency with that code. An
  * assertion cross-checks that both remain in agreement. Do not introduce new
  * side-effects.
  */
-static bool
+bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f3fa61c9c1b..9100d42ccbb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
 	 *
 	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
 	 * in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
 	int			live_tuples;
 	int			recently_dead_tuples;
 
-	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
-	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
-
 	/*
 	 * old_vmbits are the state of the all-visible and all-frozen bits in the
 	 * visibility map before updating it during phase I of vacuuming.
@@ -454,6 +438,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v26-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (14.3K, 7-v26-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 548a9a8ae3c633e5ab2cfec438aca03ba9d1e6f9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v26 06/15] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c | 275 ++++++++++++++++------------
 1 file changed, 157 insertions(+), 118 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ab567e7518b..5a7ba904f1a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
 								  int nlpdead_items,
 								  uint8 *old_vmbits,
 								  uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  uint8 old_vmbits, uint8 new_vmbits,
+									  TransactionId latest_xid_removed,
+									  TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid);
 
 
 /*
@@ -785,6 +790,68 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 uint8 old_vmbits, uint8 new_vmbits,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid)
+{
+	TransactionId conflict_xid;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune &&
+		!do_freeze &&
+		do_set_vm &&
+		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+		(new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	return conflict_xid;
+}
+
 /*
  * Helper to correct any corruption detected on an heap page and its
  * corresponding visibility map page after pruning but before setting the
@@ -986,7 +1053,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -994,10 +1060,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
 	uint8		new_vmbits = 0;
 	uint8		old_vmbits = 0;
 
-
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
@@ -1058,6 +1124,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the VM bits based on information from the VM and
+	 * the all_visible/all_frozen flags.
+	 */
+	do_set_vm = heap_page_will_set_vm(&prstate,
+									  params->relation,
+									  blockno,
+									  buffer,
+									  page,
+									  vmbuffer,
+									  prstate.lpdead_items,
+									  &old_vmbits,
+									  &new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									old_vmbits, new_vmbits,
+									prstate.latest_xid_removed,
+									prstate.frz_conflict_horizon,
+									prstate.visibility_cutoff_xid);
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1079,14 +1176,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1100,6 +1200,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * Even if we are only setting the VM and PD_ALL_VISIBLE is
+			 * already set, we don't need to worry about unnecessarily
+			 * dirtying the heap buffer below, as it must be marked dirty
+			 * before adding it to the WAL chain. The only scenario where it
+			 * is not already dirty is if the VM was removed, and that isn't
+			 * worth optimizing for.
+			 */
+			PageSetAllVisible(page);
+			visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+									 params->relation->rd_locator);
+		}
+
 		MarkBufferDirty(buffer);
 
 		/*
@@ -1107,29 +1227,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(params->relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1139,43 +1242,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->hastup = prstate.hastup;
-
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
-	if (prstate.attempt_freeze)
-	{
-		if (presult->nfrozen > 0)
-		{
-			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
-		}
-		else
-		{
-			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
-		}
-	}
-
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so we don't need to again.
-	 */
-	if (prstate.all_frozen)
-		vm_conflict_horizon = InvalidTransactionId;
-	else
-		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 	/*
 	 * During its second pass over the heap, VACUUM calls
@@ -1190,7 +1258,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(presult->lpdead_items == 0);
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
 										prstate.cutoffs->OldestXmin,
@@ -1200,66 +1269,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		Assert(prstate.all_frozen == debug_all_frozen);
 
 		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == vm_conflict_horizon);
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
 #endif
 
-	/* Now update the visibility map and PD_ALL_VISIBLE hint */
-	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
-	do_set_vm = heap_page_will_set_vm(&prstate,
-									  params->relation,
-									  blockno,
-									  buffer,
-									  page,
-									  vmbuffer,
-									  prstate.lpdead_items,
-									  &old_vmbits,
-									  &new_vmbits);
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->hastup = prstate.hastup;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
-	/*
-	 * new_vmbits should be 0 regardless of whether or not the page is
-	 * all-visible if we do not intend to set the VM.
-	 */
-	Assert(do_set_vm || new_vmbits == 0);
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
 
-	/* Set the visibility map and page visibility hint, if relevant */
-	if (do_set_vm)
+	if (prstate.attempt_freeze)
 	{
-		Assert(prstate.all_visible);
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
-		 * unnecessarily dirtying the heap buffer, as it must be marked dirty
-		 * before adding it to the WAL chain. The only scenario where it is
-		 * not already dirty is if the VM was removed, and that isn't worth
-		 * optimizing for.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buffer);
-
-		/*
-		 * If the page is being set all-frozen, we pass InvalidTransactionId
-		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
-		 * make everything safe for REDO was logged when the page's tuples
-		 * were frozen.
-		 */
-		Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
-		visibilitymap_set(params->relation, blockno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, vm_conflict_horizon,
-						  new_vmbits);
+		if (presult->nfrozen > 0)
+		{
+			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+		}
+		else
+		{
+			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+		}
 	}
-
-	/* Save the vmbits for caller */
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = new_vmbits;
 }
 
 
-- 
2.43.0



  [text/x-patch] v26-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 8-v26-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 703581cd909b354b4d1028eeba57f7edd45836e8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v26 07/15] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3fa03470722..210afa11346 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1873,9 +1873,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1892,13 +1895,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v26-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.3K, 9-v26-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 322cb242b4a861f792fec7b6e5614bf0a9fc2dad Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v26 08/15] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 112 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  28 +---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 45 insertions(+), 374 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6daf4a87dec..fb7a7548aa0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8813,50 +8813,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1823feff298..47d2479415e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5a7ba904f1a..28a50c83af4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1216,8 +1216,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * worth optimizing for.
 			 */
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
-									 params->relation->rd_locator);
+			visibilitymap_set(blockno, vmbuffer, new_vmbits,
+							  params->relation->rd_locator);
 		}
 
 		MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 210afa11346..87820f3ff49 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1895,11 +1895,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2776,9 +2776,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 * set PD_ALL_VISIBLE.
 		 */
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e15cb1825e..c0cac7ea1c3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..69678187832 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBILITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3451538565e..267d7dc5524 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4330,7 +4330,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v26-0009-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (8.1K, 10-v26-0009-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From cf4615d826340a62957be22d0c41f787194f065d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v26 09/15] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.

The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 12 ++++++------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 17 ++++++++---------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 28a50c83af4..0574c78a5eb 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -255,7 +255,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -488,7 +488,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
+	 * checked item causes GlobalVisFullXidVisibleToAll() to update the
 	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
 	 * transaction aborts.
 	 *
@@ -1331,11 +1331,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1794,7 +1794,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				/*
 				 * For now always use prstate->cutoffs for this test, because
 				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisTestIsRemovableXid
+				 * is requested. We could use GlobalVisXidVisibleToAll()
 				 * instead, if a non-freezing caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index cb5671c1a4e..3a68757c09a 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index f3a1603204e..67da6737496 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4179,8 +4179,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4214,14 +4213,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 }
 
 /*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
  *
  * It is crucial that this only gets called for xids from a source that
  * protects against xid wraparounds (e.g. from a table and thus protected by
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4235,12 +4234,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
  */
 bool
 GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4249,12 +4248,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4263,7 +4262,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v26-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (10.6K, 11-v26-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From cc69c22be974c579e9fdc186f83dd063c40e06bb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v26 10/15] Use GlobalVisState in vacuum to determine page
 level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++
 src/backend/access/heap/pruneheap.c         | 43 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 10 ++---
 src/include/access/heapam.h                 | 11 +++---
 4 files changed, 58 insertions(+), 34 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0574c78a5eb..e9754d43f72 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -451,11 +451,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -984,14 +985,13 @@ heap_page_will_set_vm(PruneState *prstate,
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -1078,6 +1078,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prune_freeze_plan(RelationGetRelid(params->relation),
 					  buffer, &prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1259,10 +1269,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
-										prstate.cutoffs->OldestXmin,
+										prstate.vistest,
 										&debug_all_frozen,
 										&debug_cutoff, off_loc));
 
@@ -1791,20 +1800,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisXidVisibleToAll()
-				 * instead, if a non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 87820f3ff49..479fb096974 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2730,7 +2730,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3491,7 +3491,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3507,7 +3507,7 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3581,7 +3581,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 		/* Visibility checks may do IO or allocate memory */
 		Assert(CritSectionCount == 0);
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3600,7 +3600,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9100d42ccbb..a33b5ef55a8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -272,10 +272,9 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Contains the cutoffs used for freezing. They are required if the
-	 * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
-	 * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
-	 * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
-	 * calculates them once, at the beginning of vacuuming the relation.
+	 * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+	 * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+	 * relation.
 	 */
 	struct VacuumCutoffs *cutoffs;
 } PruneFreezeParams;
@@ -439,7 +438,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -453,6 +452,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v26-0011-Unset-all_visible-sooner-if-not-freezing.patch (2.5K, 12-v26-0011-Unset-all_visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 6e10d58579ae071085bdf0ffd610c79129b549d2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v26 11/15] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e9754d43f72..a4c3bd00253 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1657,8 +1657,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	/*
 	 * Deliberately delay unsetting all_visible and all_frozen until later
 	 * during pruning. Removable dead tuples shouldn't preclude freezing the
-	 * page.
+	 * page. If we won't attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1917,8 +1922,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible and all_frozen until later, at the
 	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
 	 * freeze the page after pruning.  As long as we unset it before updating
-	 * the visibility map, this will be correct.
+	 * the visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v26-0012-Track-which-relations-are-modified-by-a-query.patch (2.5K, 13-v26-0012-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 8493882305b7c632c98bfda7dafcdb4785e3892c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v26 12/15] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
---
 src/backend/executor/execMain.c  | 4 ++++
 src/backend/executor/execUtils.c | 2 ++
 src/include/nodes/execnodes.h    | 6 ++++++
 3 files changed, 12 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 797d8b1ca1c..5b299ef81aa 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3968429f991..d8c385216e0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v26-0013-Pass-down-information-on-table-modification-to-s.patch (23.7K, 14-v26-0013-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 1d7279c53118e38246395b3ba575db48dc8172fd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v26 13/15] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  7 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 +++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +++----
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++----
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      | 11 ++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              | 19 ++++++++++-------
 23 files changed, 93 insertions(+), 46 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 26cb75058d1..4ad8941c60a 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index df30dcc0228..aaa5401b731 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..6c2e4e08b16 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index b7f10a1aed0..15f9cc11582 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
 	}
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index d7695dc1108..7bdbc7e5fa7 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 73ebc01a08f..a00bdfdf822 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index dae91630ac3..1957bb0f1a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b1a00ed477..130a670d266 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6377,7 +6377,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13766,7 +13766,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22623,7 +22623,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23087,7 +23087,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index be6ffd6ddb0..2921f68c1c3 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0b3a31f1703..74262a34819 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 860f79f9cc1..6e49ea5c5d8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 6bea42f128f..2c87ba5f767 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 72b135e5dcf..92674441c6d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 16b0adc172c..91acf1ee2d7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c760b19db55..ec0def0d1e2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a33b5ef55a8..ba3ff8c0845 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
 
 	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
 	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d10b1b03cdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the rel */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1164,9 +1167,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v26-0014-Allow-on-access-pruning-to-set-pages-all-visible.patch (11.3K, 15-v26-0014-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 1153915bc4db207f63e2718234478d2237fdb73d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v26 14/15] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 15 ++++++-
 src/backend/access/heap/heapam_handler.c      | 15 ++++++-
 src/backend/access/heap/pruneheap.c           | 44 +++++++++++++++++--
 src/include/access/heapam.h                   | 24 ++++++++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 5 files changed, 90 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb7a7548aa0..d9dc79f4a96 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6c2e4e08b16..2cb98e58956 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2472,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2518,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a4c3bd00253..d1ec6d1b601 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
 								  Relation relation,
 								  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
 								  Buffer vmbuffer,
+								  PruneReason reason,
+								  bool do_prune, bool do_freeze,
 								  int nlpdead_items,
 								  uint8 *old_vmbits,
 								  uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 				.cutoffs = NULL,
 			};
 
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
 
@@ -935,6 +948,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
  * corrupted, it will fix them by clearing the VM bits and visibility hint.
  * This does not need to be done in a critical section.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with returning the
  * current value of the VM bits in *old_vmbits and the desired new value of
  * the VM bits in *new_vmbits.
@@ -944,6 +960,8 @@ heap_page_will_set_vm(PruneState *prstate,
 					  Relation relation,
 					  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
 					  Buffer vmbuffer,
+					  PruneReason reason,
+					  bool do_prune, bool do_freeze,
 					  int nlpdead_items,
 					  uint8 *old_vmbits,
 					  uint8 *new_vmbits)
@@ -951,6 +969,24 @@ heap_page_will_set_vm(PruneState *prstate,
 	if (!prstate->attempt_update_vm)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+		return false;
+	}
+
 	*old_vmbits = visibilitymap_get_status(relation, heap_blk,
 										   &vmbuffer);
 
@@ -1146,6 +1182,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  buffer,
 									  page,
 									  vmbuffer,
+									  params->reason,
+									  do_prune, do_freeze,
 									  prstate.lpdead_items,
 									  &old_vmbits,
 									  &new_vmbits);
@@ -1232,9 +1270,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		MarkBufferDirty(buffer);
 
-		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
-		 */
+		/* Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did */
 		if (RelationNeedsWAL(params->relation))
 		{
 			log_heap_prune_and_freeze(params->relation, buffer,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ba3ff8c0845..c835c792c80 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 
 	/*
 	 * Some optimizations can only be performed if the query does not modify
@@ -419,7 +436,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v26-0015-Set-pd_prune_xid-on-insert.patch (6.7K, 16-v26-0015-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 86187669b4c2590e50ac5fe6111cb1531a9bceee Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v26 15/15] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../modules/index/expected/killtuples.out     |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d9dc79f4a96..ccebc1f244b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 47d2479415e..ab2db931aac 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-16 12:18  Peter Eisentraut <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Peter Eisentraut @ 2025-12-16 12:18 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Kirill Reshke <[email protected]>

On 15.12.25 22:05, Melanie Plageman wrote:
> On Sat, Dec 13, 2025 at 8:59 AM Peter Eisentraut <[email protected]> wrote:
>>
>> On 20.11.25 18:19, Melanie Plageman wrote:
>>> +     prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
>>
>> In your patch
>> v22-0001-Split-heap_page_prune_and_freeze-into-helpers.patch, the
>> assignment above casts away the const qualification of the function
>> argument presult:
> 
> Yea, this code (prune_freeze_setup() with a const-qualified
> PruneFreezeResult parameter) is actually already in master -- not just
> in this patchset.
> 
>> +static void
>> +prune_freeze_setup(PruneFreezeParams *params,
>> +                                  TransactionId new_relfrozen_xid,
>> +                                  MultiXactId new_relmin_mxid,
>> +                                  const PruneFreezeResult *presult,
>> +                                  PruneState *prstate)
>>
>> (The cast is otherwise unnecessary, since the underlying type is the
>> same on both sides.)
>>
>> Since prstate->deadoffsets is in fact later modified, this makes the
>> original const qualification invalid.
> 
> I didn't realize I was misusing const here. What I meant to indicate
> by defining the prune_freeze_setup() parameter, as const, is that the
> PruneFreezeResult wouldn't be modified by prune_freeze_setup(). I did
> not mean to indicate that no members of PruneFreezeResult would ever
> be modified.

I'm not sure there is a difference between these two statements.  The 
struct won't be modified is the same as none of its fields will be modified.

> deadoffsets is not modified in prune_freeze_setup(). So,
> are you saying that I can't define a parameter as const if even the
> caller modifies it?

You are not modifying deadoffsets in prune_freeze_setup(), but you are 
assigning its address to a pointer variable that is not const-qualified, 
and so it could be used to modify it later on.

A caller to prune_freeze_setup() that sees the signature const 
PruneFreezeResult *presult could pass a pointer to a PruneFreezeResult 
object that is notionally in read-only memory.  But through the 
non-const-qualified pointer you could later modify the pointed-to 
memory, which would be invalid.  The point of propagating the qualifiers 
is to prevent that at compile time.

If what you want is something like, "prune_freeze_setup() does not 
change any of the fields of what presult points to, but it does record a 
pointer to one of its fields with the intention of modifying it later 
after prune_freeze_setup() is finished", then I think C cannot represent 
that with this API.

Here is a simplified example:

#include <stdlib.h>

// corresponds to PruneFreezeResult
struct foo
{
	int offsets[5];
};

// corresponds to PruneState
struct bar
{
	int *offsets;
};

static void setup(const struct foo *f)
{
	struct bar *b = malloc(sizeof(struct bar));

	b->offsets = f->offsets;  // warning
}

This produces a warning:

test.c:20:20: warning: assignment discards 'const' qualifier from 
pointer target type

The reason is that what "f" points to is const, which means that all its 
fields are const.  The fix is to remove the const from the function 
argument declaration.

One of the possible sources of confusion here is that one struct uses an 
array and the other a pointer, and these sometimes behave similarly and 
sometimes not.






^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-16 16:07  Melanie Plageman <[email protected]>
  parent: Peter Eisentraut <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-12-16 16:07 UTC (permalink / raw)
  To: Peter Eisentraut <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Kirill Reshke <[email protected]>

On Tue, Dec 16, 2025 at 7:18 AM Peter Eisentraut <[email protected]> wrote:
>
> You are not modifying deadoffsets in prune_freeze_setup(), but you are
> assigning its address to a pointer variable that is not const-qualified,
> and so it could be used to modify it later on.
>
> A caller to prune_freeze_setup() that sees the signature const
> PruneFreezeResult *presult could pass a pointer to a PruneFreezeResult
> object that is notionally in read-only memory.  But through the
> non-const-qualified pointer you could later modify the pointed-to
> memory, which would be invalid.  The point of propagating the qualifiers
> is to prevent that at compile time.

Thanks for the explanation. I've committed your proposed fix.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-16 16:58  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 2 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-12-16 16:58 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Chao Li <[email protected]>

On Wed, Dec 3, 2025 at 6:07 PM Melanie Plageman
<[email protected]> wrote:
>
> If we're just talking about the renaming, looking at procarray.c, it
> is full of the word "removable" because its functions were largely
> used to examine and determine if everyone can see an xmax as committed
> and thus if that tuple is removable from their perspective. But
> nothing about the code that I can see means it has to be an xmax. We
> could just as well use the functions to determine if everyone can see
> an xmin as committed.

In the attached v27, I've removed the commit that renamed functions in
procarray.c. I've added a single wrapper GlobalVisTestXidNotRunning()
that is used in my code where I am testing live tuples. I think you'll
find that I've addressed all of your review comments now -- as I've
also gotten rid of the confusing blk_known_av logic through a series
of refactors.

The one outstanding point is which commits should bump
XLOG_PAGE_MAGIC. (also review of the reworked patches).

- Melanie


Attachments:

  [text/x-patch] v27-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch (9.6K, 2-v27-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch)
  download | inline diff:
From eb1a372848e3274d98b129d7f77ca1c11f4dceb8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v27 01/14] Combine visibilitymap_set() cases in
 lazy_scan_prune()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).

In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().

Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would fail
anyway.

Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.

This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
 .../pg_visibility/expected/pg_visibility.out  | 35 +++++++
 contrib/pg_visibility/sql/pg_visibility.sql   | 16 ++++
 src/backend/access/heap/vacuumlazy.c          | 95 +++++--------------
 3 files changed, 73 insertions(+), 73 deletions(-)

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..cbc04aad016 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -204,6 +204,41 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (1,1)
+(1 row)
+
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map 
+----------------------------
+ 
+(1 row)
+
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (0,0)
+(1 row)
+
+-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
+-- modification
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (1,1)
+(1 row)
+
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 -- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..0d13116248b 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,22 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
+-- modification
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 62035b7f9c3..811e7e33678 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2094,16 +2094,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
 	 * all_frozen variables
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.all_visible && !all_visible_according_to_vm) ||
+		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
 	{
 		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
+		uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
+			new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
 		/*
 		 * It should never be the case that the visibility map page is set
@@ -2111,19 +2109,29 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+		 * unnecessarily dirtying the heap buffer, as it must be marked dirty
+		 * before adding it to the WAL chain. The only scenario where it is
+		 * not already dirty is if the VM was removed, and that isn't worth
+		 * optimizing for.
 		 */
 		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
+
+		/*
+		 * If the page is being set all-frozen, we pass InvalidTransactionId
+		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+		 * make everything safe for REDO was logged when the page's tuples
+		 * were frozen.
+		 */
+		Assert(!presult.all_frozen ||
+			   !TransactionIdIsValid(presult.vm_conflict_horizon));
+
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
+									   vmbuffer,
+									   presult.vm_conflict_horizon,
+									   new_vmbits);
 
 		/*
 		 * If the page wasn't already set all-visible and/or all-frozen in the
@@ -2191,65 +2199,6 @@ lazy_scan_prune(LVRelState *vacrel,
 							VISIBILITYMAP_VALID_BITS);
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v27-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch (13.3K, 3-v27-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch)
  download | inline diff:
From 808d8a5816f0764471ab92d43c57518279cd53c2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 11 Dec 2025 10:48:13 -0500
Subject: [PATCH v27 02/14] Eliminate use of cached VM value in
 lazy_scan_prune()

lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a365, when we did not pin the
VM page until confirming it was not all-visible. Now that the VM page is
already pinned, there is no meaningful benefit to relying on a cached VM
status.

Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available which would make the logic harder to reason about. Eliminating
it also enables us to detect and repair VM corruption on-access.

Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible that after
fixing corruption, the VM could be newly set, if pruning found the page
all-visible.

Author: Melanie Plageman <[email protected]>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 172 ++++++++++++---------------
 1 file changed, 79 insertions(+), 93 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 811e7e33678..436143cd12c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -253,7 +253,6 @@ typedef enum
  * about the block it read to the caller.
  */
 #define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
-#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
 
 typedef struct LVRelState
 {
@@ -358,7 +357,6 @@ typedef struct LVRelState
 	/* State maintained by heap_vac_scan_next_block() */
 	BlockNumber current_block;	/* last block returned */
 	BlockNumber next_unskippable_block; /* next unskippable block */
-	bool		next_unskippable_allvis;	/* its visibility status */
 	bool		next_unskippable_eager_scanned; /* if it was eagerly scanned */
 	Buffer		next_unskippable_vmbuffer;	/* buffer containing its VM bit */
 
@@ -432,7 +430,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							Buffer vmbuffer, bool all_visible_according_to_vm,
+							Buffer vmbuffer,
 							bool *has_lpdead_items, bool *vm_page_frozen);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
@@ -1249,7 +1247,6 @@ lazy_scan_heap(LVRelState *vacrel)
 	/* Initialize for the first heap_vac_scan_next_block() call */
 	vacrel->current_block = InvalidBlockNumber;
 	vacrel->next_unskippable_block = InvalidBlockNumber;
-	vacrel->next_unskippable_allvis = false;
 	vacrel->next_unskippable_eager_scanned = false;
 	vacrel->next_unskippable_vmbuffer = InvalidBuffer;
 
@@ -1265,13 +1262,13 @@ lazy_scan_heap(LVRelState *vacrel)
 										MAIN_FORKNUM,
 										heap_vac_scan_next_block,
 										vacrel,
-										sizeof(uint8));
+										sizeof(bool));
 
 	while (true)
 	{
 		Buffer		buf;
 		Page		page;
-		uint8		blk_info = 0;
+		bool		was_eager_scanned = false;
 		int			ndeleted = 0;
 		bool		has_lpdead_items;
 		void	   *per_buffer_data = NULL;
@@ -1340,13 +1337,13 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (!BufferIsValid(buf))
 			break;
 
-		blk_info = *((uint8 *) per_buffer_data);
+		was_eager_scanned = *((bool *) per_buffer_data);
 		CheckBufferIsPinnedOnce(buf);
 		page = BufferGetPage(buf);
 		blkno = BufferGetBlockNumber(buf);
 
 		vacrel->scanned_pages++;
-		if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
+		if (was_eager_scanned)
 			vacrel->eager_scanned_pages++;
 
 		/* Report as block scanned, update error traceback information */
@@ -1417,7 +1414,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (got_cleanup_lock)
 			ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
 									   vmbuffer,
-									   blk_info & VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM,
 									   &has_lpdead_items, &vm_page_frozen);
 
 		/*
@@ -1434,8 +1430,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * exclude pages skipped due to cleanup lock contention from eager
 		 * freeze algorithm caps.
 		 */
-		if (got_cleanup_lock &&
-			(blk_info & VAC_BLK_WAS_EAGER_SCANNED))
+		if (got_cleanup_lock && was_eager_scanned)
 		{
 			/* Aggressive vacuums do not eager scan. */
 			Assert(!vacrel->aggressive);
@@ -1602,7 +1597,6 @@ heap_vac_scan_next_block(ReadStream *stream,
 {
 	BlockNumber next_block;
 	LVRelState *vacrel = callback_private_data;
-	uint8		blk_info = 0;
 
 	/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
 	next_block = vacrel->current_block + 1;
@@ -1665,8 +1659,8 @@ heap_vac_scan_next_block(ReadStream *stream,
 		 * otherwise they would've been unskippable.
 		 */
 		vacrel->current_block = next_block;
-		blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
-		*((uint8 *) per_buffer_data) = blk_info;
+		/* Block was not eager scanned */
+		*((bool *) per_buffer_data) = false;
 		return vacrel->current_block;
 	}
 	else
@@ -1678,11 +1672,7 @@ heap_vac_scan_next_block(ReadStream *stream,
 		Assert(next_block == vacrel->next_unskippable_block);
 
 		vacrel->current_block = next_block;
-		if (vacrel->next_unskippable_allvis)
-			blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
-		if (vacrel->next_unskippable_eager_scanned)
-			blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
-		*((uint8 *) per_buffer_data) = blk_info;
+		*((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
 		return vacrel->current_block;
 	}
 }
@@ -1707,7 +1697,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 	BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
 	Buffer		next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
 	bool		next_unskippable_eager_scanned = false;
-	bool		next_unskippable_allvis;
 
 	*skipsallvis = false;
 
@@ -1717,7 +1706,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 													   next_unskippable_block,
 													   &next_unskippable_vmbuffer);
 
-		next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
 
 		/*
 		 * At the start of each eager scan region, normal vacuums with eager
@@ -1736,7 +1724,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 		 * A block is unskippable if it is not all visible according to the
 		 * visibility map.
 		 */
-		if (!next_unskippable_allvis)
+		if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
 		{
 			Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
 			break;
@@ -1793,7 +1781,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 
 	/* write the local variables back to vacrel */
 	vacrel->next_unskippable_block = next_unskippable_block;
-	vacrel->next_unskippable_allvis = next_unskippable_allvis;
 	vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
 	vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
 }
@@ -1954,9 +1941,7 @@ cmpOffsetNumbers(const void *a, const void *b)
  * Caller must hold pin and buffer cleanup lock on the buffer.
  *
  * vmbuffer is the buffer containing the VM block with visibility information
- * for the heap block, blkno. all_visible_according_to_vm is the saved
- * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * for the heap block, blkno.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1973,7 +1958,6 @@ lazy_scan_prune(LVRelState *vacrel,
 				BlockNumber blkno,
 				Page page,
 				Buffer vmbuffer,
-				bool all_visible_according_to_vm,
 				bool *has_lpdead_items,
 				bool *vm_page_frozen)
 {
@@ -1987,6 +1971,8 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
+	uint8		old_vmbits = 0;
+	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2089,70 +2075,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 	Assert(!presult.all_frozen || presult.all_visible);
 
-	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
-	 */
-	if ((presult.all_visible && !all_visible_according_to_vm) ||
-		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
-	{
-		uint8		old_vmbits;
-		uint8		new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-			new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
-		 * unnecessarily dirtying the heap buffer, as it must be marked dirty
-		 * before adding it to the WAL chain. The only scenario where it is
-		 * not already dirty is if the VM was removed, and that isn't worth
-		 * optimizing for.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-
-		/*
-		 * If the page is being set all-frozen, we pass InvalidTransactionId
-		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
-		 * make everything safe for REDO was logged when the page's tuples
-		 * were frozen.
-		 */
-		Assert(!presult.all_frozen ||
-			   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer,
-									   presult.vm_conflict_horizon,
-									   new_vmbits);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
+	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
 
 	/*
 	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
@@ -2160,8 +2083,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
 	 * with buffer lock before concluding that the VM is corrupt.
 	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	if (!PageIsAllVisible(page) &&
+		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
 		ereport(WARNING,
 				(errcode(ERRCODE_DATA_CORRUPTED),
@@ -2199,6 +2122,69 @@ lazy_scan_prune(LVRelState *vacrel,
 							VISIBILITYMAP_VALID_BITS);
 	}
 
+	if (!presult.all_visible)
+		return presult.ndeleted;
+
+	/* Set the visibility map and page visibility hint */
+	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (presult.all_frozen)
+		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	/* Nothing to do */
+	if (old_vmbits == new_vmbits)
+		return presult.ndeleted;
+
+	Assert(presult.all_visible);
+
+	/*
+	 * It should never be the case that the visibility map page is set while
+	 * the page-level bit is clear, but the reverse is allowed (if checksums
+	 * are not enabled).  Regardless, set both bits so that we get back in
+	 * sync.
+	 *
+	 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+	 * unnecessarily dirtying the heap buffer, as it must be marked dirty
+	 * before adding it to the WAL chain. The only scenario where it is not
+	 * already dirty is if the VM was removed, and that isn't worth optimizing
+	 * for.
+	 */
+	PageSetAllVisible(page);
+	MarkBufferDirty(buf);
+
+	/*
+	 * If the page is being set all-frozen, we pass InvalidTransactionId as
+	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
+	 * everything safe for REDO was logged when the page's tuples were frozen.
+	 */
+	Assert(!presult.all_frozen ||
+		   !TransactionIdIsValid(presult.vm_conflict_horizon));
+
+	visibilitymap_set(vacrel->rel, blkno, buf,
+					  InvalidXLogRecPtr,
+					  vmbuffer, presult.vm_conflict_horizon,
+					  new_vmbits);
+
+	/*
+	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
+	 * count it as newly set for logging.
+	 */
+	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+	{
+		vacrel->vm_new_visible_pages++;
+		if (presult.all_frozen)
+		{
+			vacrel->vm_new_visible_frozen_pages++;
+			*vm_page_frozen = true;
+		}
+	}
+	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 presult.all_frozen)
+	{
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
+	}
+
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v27-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch (6.5K, 4-v27-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch)
  download | inline diff:
From 026cfe10b79328d6b9f68703dfa9db1b4e7e619d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v27 03/14] Refactor lazy_scan_prune() VM clear logic into
 helper

Encapsulating them in a helper makes the whole function clearer. Before
we move all of this logic into heap_page_prune_and_freeze(), we want to
make it more compact and clear.
---
 src/backend/access/heap/vacuumlazy.c | 122 +++++++++++++++++----------
 1 file changed, 78 insertions(+), 44 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 436143cd12c..425dc2f8691 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -428,6 +428,11 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+										   BlockNumber heap_blk, Page heap_page,
+										   int nlpdead_items,
+										   Buffer vmbuffer,
+										   uint8 vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1935,6 +1940,77 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ */
+static void
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+							   BlockNumber heap_blk, Page heap_page,
+							   int nlpdead_items,
+							   Buffer vmbuffer,
+							   uint8 vmbits)
+{
+	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (!PageIsAllVisible(heap_page) &&
+		((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+}
+
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2077,50 +2153,8 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(page) &&
-		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
+	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+								   presult.lpdead_items, vmbuffer, old_vmbits);
 
 	if (!presult.all_visible)
 		return presult.ndeleted;
-- 
2.43.0



  [text/x-patch] v27-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch (26.2K, 5-v27-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch)
  download | inline diff:
From d87e33478520e52fc010071e8dcd6eac5460ec27 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v27 04/14] Set the VM in heap_page_prune_and_freeze()

This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
---
 src/backend/access/heap/pruneheap.c  | 301 +++++++++++++++++++++++----
 src/backend/access/heap/vacuumlazy.c | 142 +------------
 src/include/access/heapam.h          |  21 ++
 3 files changed, 285 insertions(+), 179 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 07aa08cfe14..c979625551c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+										   BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+										   Buffer vmbuffer,
+										   uint8 vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+								  Relation relation,
+								  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+								  Buffer vmbuffer,
+								  int nlpdead_items,
+								  uint8 *old_vmbits,
+								  uint8 *new_vmbits);
 
 
 /*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeParams params = {
 				.relation = relation,
 				.buffer = buffer,
+				.vmbuffer = InvalidBuffer,
 				.reason = PRUNE_ON_ACCESS,
 				.options = 0,
 				.vistest = vistest,
@@ -341,6 +356,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 
 	/*
@@ -396,51 +413,54 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
-	 * that after scanning the line pointers. We must correct all_visible and
-	 * all_frozen before we return them to the caller, so that the caller
-	 * doesn't set the VM bits incorrectly.
+	 * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+	 * them after deciding whether to freeze, but before updating the VM, to
+	 * avoid setting the VM bits incorrectly.
+	 *
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate->attempt_freeze)
 	{
 		prstate->all_visible = true;
 		prstate->all_frozen = true;
 	}
+	else if (prstate->attempt_update_vm)
+	{
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate->all_visible = false;
 		prstate->all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -775,10 +795,134 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ */
+static void
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+							   BlockNumber heap_blk, Page heap_page,
+							   int nlpdead_items,
+							   Buffer vmbuffer,
+							   uint8 vmbits)
+{
+	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (!PageIsAllVisible(heap_page) &&
+		((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+					  Relation relation,
+					  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+					  Buffer vmbuffer,
+					  int nlpdead_items,
+					  uint8 *old_vmbits,
+					  uint8 *new_vmbits)
+{
+	if (!prstate->attempt_update_vm)
+		return false;
+
+	*old_vmbits = visibilitymap_get_status(relation, heap_blk,
+										   &vmbuffer);
+
+	/* We do this even if not all-visible */
+	identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+								   nlpdead_items, vmbuffer,
+								   *old_vmbits);
+
+	if (!prstate->all_visible)
+		return false;
+
+	*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->all_frozen)
+		*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (*new_vmbits == *old_vmbits)
+	{
+		*new_vmbits = 0;
+		return false;
+	}
+
+	return true;
+}
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -793,12 +937,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -823,13 +968,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
+
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1011,6 +1162,64 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
 		}
 	}
+
+	/* Now update the visibility map and PD_ALL_VISIBLE hint */
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate,
+									  params->relation,
+									  blockno,
+									  buffer,
+									  page,
+									  vmbuffer,
+									  prstate.lpdead_items,
+									  &old_vmbits,
+									  &new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	/* Set the visibility map and page visibility hint, if relevant */
+	if (do_set_vm)
+	{
+		Assert(prstate.all_visible);
+
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed (if
+		 * checksums are not enabled).  Regardless, set both bits so that we
+		 * get back in sync.
+		 *
+		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+		 * unnecessarily dirtying the heap buffer, as it must be marked dirty
+		 * before adding it to the WAL chain. The only scenario where it is
+		 * not already dirty is if the VM was removed, and that isn't worth
+		 * optimizing for.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buffer);
+
+		/*
+		 * If the page is being set all-frozen, we pass InvalidTransactionId
+		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+		 * make everything safe for REDO was logged when the page's tuples
+		 * were frozen.
+		 */
+		Assert(!prstate.all_frozen ||
+			   !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+		visibilitymap_set(params->relation, blockno, buffer,
+						  InvalidXLogRecPtr,
+						  vmbuffer, presult->vm_conflict_horizon,
+						  new_vmbits);
+	}
+
+	/* Save the vmbits for caller */
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = new_vmbits;
 }
 
 
@@ -1485,6 +1694,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			{
 				TransactionId xmin;
 
+				Assert(prstate->attempt_update_vm);
+
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 425dc2f8691..ccfad5b2dba 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -428,11 +428,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1940,77 +1935,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on an heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2042,13 +1966,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2148,73 +2071,24 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer, old_vmbits);
-
-	if (!presult.all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
-	if (presult.all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	Assert(presult.all_visible);
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear, but the reverse is allowed (if checksums
-	 * are not enabled).  Regardless, set both bits so that we get back in
-	 * sync.
-	 *
-	 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
-	 * unnecessarily dirtying the heap buffer, as it must be marked dirty
-	 * before adding it to the WAL chain. The only scenario where it is not
-	 * already dirty is if the VM was removed, and that isn't worth optimizing
-	 * for.
-	 */
-	PageSetAllVisible(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
 	/*
 	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
 	 * count it as newly set for logging.
 	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
 		vacrel->vm_new_visible_pages++;
-		if (presult.all_frozen)
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
 			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.all_frozen)
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
 		vacrel->vm_new_frozen_pages++;
 		*vm_page_frozen = true;
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f7e4ae3843c..f3fa61c9c1b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * vmbuffer is the buffer that must already contain the required block of
+	 * the visibility map if we are to update it.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
 	 *
 	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
 	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 *
+	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+	 * in the VM.
 	 */
 	int			options;
 
@@ -299,6 +309,17 @@ typedef struct PruneFreezeResult
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
+	 *
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
+	 * we have attempted to update the VM.
+	 */
+	uint8		new_vmbits;
+	uint8		old_vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v27-0005-Move-VM-assert-into-prune-freeze-code.patch (10.9K, 6-v27-0005-Move-VM-assert-into-prune-freeze-code.patch)
  download | inline diff:
From ef7ba68de0fa62c11c9f71e8ce1c577efa81d0ee Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v27 05/14] Move VM assert into prune/freeze code

This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
---
 src/backend/access/heap/pruneheap.c  | 86 ++++++++++++++++++++++------
 src/backend/access/heap/vacuumlazy.c | 68 +---------------------
 src/include/access/heapam.h          | 25 +++-----
 3 files changed, 77 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c979625551c..0ca16340e3e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -918,6 +918,31 @@ heap_page_will_set_vm(PruneState *prstate,
 	return true;
 }
 
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+#endif
+
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -971,6 +996,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -1129,23 +1155,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
@@ -1163,6 +1174,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		}
 	}
 
+	/*
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so we don't need to again.
+	 */
+	if (prstate.all_frozen)
+		vm_conflict_horizon = InvalidTransactionId;
+	else
+		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(presult->lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(params->relation, buffer,
+										prstate.cutoffs->OldestXmin,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == vm_conflict_horizon);
+	}
+#endif
+
 	/* Now update the visibility map and PD_ALL_VISIBLE hint */
 	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
@@ -1208,12 +1259,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * make everything safe for REDO was logged when the page's tuples
 		 * were frozen.
 		 */
-		Assert(!prstate.all_frozen ||
-			   !TransactionIdIsValid(presult->vm_conflict_horizon));
+		Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
 
 		visibilitymap_set(params->relation, blockno, buffer,
 						  InvalidXLogRecPtr,
-						  vmbuffer, presult->vm_conflict_horizon,
+						  vmbuffer, vm_conflict_horizon,
 						  new_vmbits);
 	}
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ccfad5b2dba..3fa03470722 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -462,20 +462,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
-										   OffsetNumber *deadoffsets,
-										   int ndeadoffsets,
-										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
-										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2012,32 +1998,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -3494,29 +3454,6 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
-{
-
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
-										  NULL, 0,
-										  all_frozen,
-										  visibility_cutoff_xid,
-										  logging_offnum);
-}
-#endif
 
 /*
  * Check whether the heap page in buf is all-visible except for the dead
@@ -3540,15 +3477,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
  * This logic is closely related to heap_prune_record_unchanged_lp_normal().
  * If you modify this function, ensure consistency with that code. An
  * assertion cross-checks that both remain in agreement. Do not introduce new
  * side-effects.
  */
-static bool
+bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f3fa61c9c1b..9100d42ccbb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
 	 *
 	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
 	 * in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
 	int			live_tuples;
 	int			recently_dead_tuples;
 
-	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
-	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
-
 	/*
 	 * old_vmbits are the state of the all-visible and all-frozen bits in the
 	 * visibility map before updating it during phase I of vacuuming.
@@ -454,6 +438,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v27-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (14.3K, 7-v27-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From bfd883e25cc43eee9e03e98912fa72364ecebc81 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v27 06/14] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c | 275 ++++++++++++++++------------
 1 file changed, 157 insertions(+), 118 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0ca16340e3e..e74c2e06226 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
 								  int nlpdead_items,
 								  uint8 *old_vmbits,
 								  uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  uint8 old_vmbits, uint8 new_vmbits,
+									  TransactionId latest_xid_removed,
+									  TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid);
 
 
 /*
@@ -795,6 +800,68 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 uint8 old_vmbits, uint8 new_vmbits,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid)
+{
+	TransactionId conflict_xid;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune &&
+		!do_freeze &&
+		do_set_vm &&
+		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+		(new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	return conflict_xid;
+}
+
 /*
  * Helper to correct any corruption detected on an heap page and its
  * corresponding visibility map page after pruning but before setting the
@@ -996,7 +1063,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -1004,10 +1070,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
 	uint8		new_vmbits = 0;
 	uint8		old_vmbits = 0;
 
-
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
@@ -1068,6 +1134,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the VM bits based on information from the VM and
+	 * the all_visible/all_frozen flags.
+	 */
+	do_set_vm = heap_page_will_set_vm(&prstate,
+									  params->relation,
+									  blockno,
+									  buffer,
+									  page,
+									  vmbuffer,
+									  prstate.lpdead_items,
+									  &old_vmbits,
+									  &new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									old_vmbits, new_vmbits,
+									prstate.latest_xid_removed,
+									prstate.frz_conflict_horizon,
+									prstate.visibility_cutoff_xid);
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1089,14 +1186,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1110,6 +1210,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * Even if we are only setting the VM and PD_ALL_VISIBLE is
+			 * already set, we don't need to worry about unnecessarily
+			 * dirtying the heap buffer below, as it must be marked dirty
+			 * before adding it to the WAL chain. The only scenario where it
+			 * is not already dirty is if the VM was removed, and that isn't
+			 * worth optimizing for.
+			 */
+			PageSetAllVisible(page);
+			visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+									 params->relation->rd_locator);
+		}
+
 		MarkBufferDirty(buffer);
 
 		/*
@@ -1117,29 +1237,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(params->relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1149,43 +1252,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->hastup = prstate.hastup;
-
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
-	if (prstate.attempt_freeze)
-	{
-		if (presult->nfrozen > 0)
-		{
-			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
-		}
-		else
-		{
-			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
-		}
-	}
-
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so we don't need to again.
-	 */
-	if (prstate.all_frozen)
-		vm_conflict_horizon = InvalidTransactionId;
-	else
-		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 	/*
 	 * During its second pass over the heap, VACUUM calls
@@ -1200,7 +1268,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(presult->lpdead_items == 0);
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
 										prstate.cutoffs->OldestXmin,
@@ -1210,66 +1279,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		Assert(prstate.all_frozen == debug_all_frozen);
 
 		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == vm_conflict_horizon);
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
 #endif
 
-	/* Now update the visibility map and PD_ALL_VISIBLE hint */
-	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
-	do_set_vm = heap_page_will_set_vm(&prstate,
-									  params->relation,
-									  blockno,
-									  buffer,
-									  page,
-									  vmbuffer,
-									  prstate.lpdead_items,
-									  &old_vmbits,
-									  &new_vmbits);
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->hastup = prstate.hastup;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
-	/*
-	 * new_vmbits should be 0 regardless of whether or not the page is
-	 * all-visible if we do not intend to set the VM.
-	 */
-	Assert(do_set_vm || new_vmbits == 0);
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
 
-	/* Set the visibility map and page visibility hint, if relevant */
-	if (do_set_vm)
+	if (prstate.attempt_freeze)
 	{
-		Assert(prstate.all_visible);
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
-		 * unnecessarily dirtying the heap buffer, as it must be marked dirty
-		 * before adding it to the WAL chain. The only scenario where it is
-		 * not already dirty is if the VM was removed, and that isn't worth
-		 * optimizing for.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buffer);
-
-		/*
-		 * If the page is being set all-frozen, we pass InvalidTransactionId
-		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
-		 * make everything safe for REDO was logged when the page's tuples
-		 * were frozen.
-		 */
-		Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
-		visibilitymap_set(params->relation, blockno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, vm_conflict_horizon,
-						  new_vmbits);
+		if (presult->nfrozen > 0)
+		{
+			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+		}
+		else
+		{
+			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+		}
 	}
-
-	/* Save the vmbits for caller */
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = new_vmbits;
 }
 
 
-- 
2.43.0



  [text/x-patch] v27-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 8-v27-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 505f5ed860559b4c45c53aee6fd8355e1bfd4ea8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v27 07/14] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3fa03470722..210afa11346 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1873,9 +1873,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1892,13 +1895,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v27-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.3K, 9-v27-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 02a2153aaa4be45d868047e6100ea1ca47f5d7e2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v27 08/14] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 112 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  28 +---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 45 insertions(+), 374 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6daf4a87dec..fb7a7548aa0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8813,50 +8813,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1823feff298..47d2479415e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e74c2e06226..8568587af4a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1226,8 +1226,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * worth optimizing for.
 			 */
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
-									 params->relation->rd_locator);
+			visibilitymap_set(blockno, vmbuffer, new_vmbits,
+							  params->relation->rd_locator);
 		}
 
 		MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 210afa11346..87820f3ff49 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1895,11 +1895,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2776,9 +2776,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 * set PD_ALL_VISIBLE.
 		 */
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e15cb1825e..c0cac7ea1c3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..69678187832 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBILITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 04845d5e680..6505628120c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4331,7 +4331,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v27-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.6K, 10-v27-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From d301117f52d6a6e78fbdafbbb2c0c4dd62b5b861 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v27 09/14] Use GlobalVisState in vacuum to determine page
 level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c | 50 +++++++++++++++++++++
 src/backend/access/heap/pruneheap.c         | 43 ++++++++----------
 src/backend/access/heap/vacuumlazy.c        | 10 ++---
 src/include/access/heapam.h                 | 13 +++---
 4 files changed, 82 insertions(+), 34 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..6bcd8b6d017 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,56 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID is no longer considered running by
+ * any snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the tuple’s commit status. Its purpose is purely
+ * semantic: when applied to live tuples, GlobalVisTestIsRemovableXid() is
+ * checking whether the inserting transaction is still considered running,
+ * not whether the tuple is removable. Live tuples are, by definition, not
+ * removable, but the snapshot criteria for “transaction still running” are
+ * identical to those used for deletion XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidNotRunning(GlobalVisState *state, TransactionId xid)
+{
+	return GlobalVisTestIsRemovableXid(state, xid);
+}
+
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisTestXidNotRunning(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8568587af4a..08ffe511d03 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -461,11 +461,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -994,14 +995,13 @@ heap_page_will_set_vm(PruneState *prstate,
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -1088,6 +1088,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prune_freeze_plan(RelationGetRelid(params->relation),
 					  buffer, &prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisTestXidNotRunning(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1269,10 +1279,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
-										prstate.cutoffs->OldestXmin,
+										prstate.vistest,
 										&debug_all_frozen,
 										&debug_cutoff, off_loc));
 
@@ -1801,20 +1810,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisTestIsRemovableXid
-				 * instead, if a non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 87820f3ff49..3b8c9dbdb4b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2730,7 +2730,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3491,7 +3491,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3507,7 +3507,7 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3581,7 +3581,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 		/* Visibility checks may do IO or allocate memory */
 		Assert(CritSectionCount == 0);
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3600,7 +3600,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
+					if (!GlobalVisTestXidNotRunning(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9100d42ccbb..e2ee035ae0b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -272,10 +272,9 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Contains the cutoffs used for freezing. They are required if the
-	 * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
-	 * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
-	 * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
-	 * calculates them once, at the beginning of vacuuming the relation.
+	 * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+	 * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+	 * relation.
 	 */
 	struct VacuumCutoffs *cutoffs;
 } PruneFreezeParams;
@@ -439,7 +438,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -453,6 +452,10 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidNotRunning(GlobalVisState *state, TransactionId xid);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v27-0010-Unset-all_visible-sooner-if-not-freezing.patch (2.5K, 11-v27-0010-Unset-all_visible-sooner-if-not-freezing.patch)
  download | inline diff:
From e3dd1db8931e00d09d1c29d399f56434146beab3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v27 10/14] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 08ffe511d03..3d34532b766 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1667,8 +1667,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	/*
 	 * Deliberately delay unsetting all_visible and all_frozen until later
 	 * during pruning. Removable dead tuples shouldn't preclude freezing the
-	 * page.
+	 * page. If we won't attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1927,8 +1932,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible and all_frozen until later, at the
 	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
 	 * freeze the page after pruning.  As long as we unset it before updating
-	 * the visibility map, this will be correct.
+	 * the visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v27-0011-Track-which-relations-are-modified-by-a-query.patch (2.5K, 12-v27-0011-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 48da46f219ac3f4c09b4cb6df23a31544921087e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v27 11/14] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
---
 src/backend/executor/execMain.c  | 4 ++++
 src/backend/executor/execUtils.c | 2 ++
 src/include/nodes/execnodes.h    | 6 ++++++
 3 files changed, 12 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 797d8b1ca1c..5b299ef81aa 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3968429f991..d8c385216e0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v27-0012-Pass-down-information-on-table-modification-to-s.patch (23.7K, 13-v27-0012-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 2b371cbe252e262f2fdf68b9f507f3d5f401628e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v27 12/14] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  7 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 +++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +++----
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++----
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      | 11 ++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              | 19 ++++++++++-------
 23 files changed, 93 insertions(+), 46 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 26cb75058d1..4ad8941c60a 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index df30dcc0228..aaa5401b731 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..6c2e4e08b16 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index b7f10a1aed0..15f9cc11582 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
 	}
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index d7695dc1108..7bdbc7e5fa7 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 73ebc01a08f..a00bdfdf822 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index dae91630ac3..1957bb0f1a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b1a00ed477..130a670d266 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6377,7 +6377,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13766,7 +13766,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22623,7 +22623,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23087,7 +23087,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index be6ffd6ddb0..2921f68c1c3 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0b3a31f1703..74262a34819 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 860f79f9cc1..6e49ea5c5d8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 6bea42f128f..2c87ba5f767 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 72b135e5dcf..92674441c6d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 16b0adc172c..91acf1ee2d7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c760b19db55..ec0def0d1e2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e2ee035ae0b..38294b33fac 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
 
 	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
 	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d10b1b03cdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the rel */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1164,9 +1167,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v27-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch (11.3K, 14-v27-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 519f2e4ee947d35edd6182850a26988744343ed4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v27 13/14] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 15 ++++++-
 src/backend/access/heap/heapam_handler.c      | 15 ++++++-
 src/backend/access/heap/pruneheap.c           | 44 +++++++++++++++++--
 src/include/access/heapam.h                   | 24 ++++++++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 5 files changed, 90 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb7a7548aa0..d9dc79f4a96 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6c2e4e08b16..2cb98e58956 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2472,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2518,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3d34532b766..393dff5ab3d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
 								  Relation relation,
 								  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
 								  Buffer vmbuffer,
+								  PruneReason reason,
+								  bool do_prune, bool do_freeze,
 								  int nlpdead_items,
 								  uint8 *old_vmbits,
 								  uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 				.cutoffs = NULL,
 			};
 
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
 
@@ -945,6 +958,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
  * corrupted, it will fix them by clearing the VM bits and visibility hint.
  * This does not need to be done in a critical section.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with returning the
  * current value of the VM bits in *old_vmbits and the desired new value of
  * the VM bits in *new_vmbits.
@@ -954,6 +970,8 @@ heap_page_will_set_vm(PruneState *prstate,
 					  Relation relation,
 					  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
 					  Buffer vmbuffer,
+					  PruneReason reason,
+					  bool do_prune, bool do_freeze,
 					  int nlpdead_items,
 					  uint8 *old_vmbits,
 					  uint8 *new_vmbits)
@@ -961,6 +979,24 @@ heap_page_will_set_vm(PruneState *prstate,
 	if (!prstate->attempt_update_vm)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+		return false;
+	}
+
 	*old_vmbits = visibilitymap_get_status(relation, heap_blk,
 										   &vmbuffer);
 
@@ -1156,6 +1192,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  buffer,
 									  page,
 									  vmbuffer,
+									  params->reason,
+									  do_prune, do_freeze,
 									  prstate.lpdead_items,
 									  &old_vmbits,
 									  &new_vmbits);
@@ -1242,9 +1280,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		MarkBufferDirty(buffer);
 
-		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
-		 */
+		/* Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did */
 		if (RelationNeedsWAL(params->relation))
 		{
 			log_heap_prune_and_freeze(params->relation, buffer,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 38294b33fac..6ed681b815c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 
 	/*
 	 * Some optimizations can only be performed if the query does not modify
@@ -419,7 +436,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v27-0014-Set-pd_prune_xid-on-insert.patch (6.7K, 15-v27-0014-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From eba0bfcd80c64a2c89e631db77f1afb3090de471 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v27 14/14] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../modules/index/expected/killtuples.out     |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d9dc79f4a96..ccebc1f244b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 47d2479415e..ab2db931aac 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-17 18:27  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Kirill Reshke @ 2025-12-17 18:27 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Chao Li <[email protected]>

Hi!

in v27-0001:
> Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
> > The last vacuum is expected to set vm bits, but the test doesn’t verify that. Should we verify that like:
> > ```
> > evantest=# SELECT blkno, all_visible, all_frozen FROM pg_visibility_map('test_vac_unmodified_heap');
> > blkno | all_visible | all_frozen
> > -------+-------------+------------
> > 0 | t | t
> > (1 row)

> I've done this. I've actually added three such verifications -- one
> after each step where the VM is expected to change. It shouldn't be
> very expensive, so I think it is okay. The way the test would fail if
> the buffer wasn't correctly dirtied is that it would assert out -- so
> the visibility map test wouldn't even have a chance to fail. But, I
> think it is also okay to confirm that the expected things are
> happening with the VM -- it just gives us extra coverage.

+1 on extra coverage. Should we also do sql-level check that the VM
indeed does not need to set PD_ALL_VISIBLE (check header bytes using
pageinspect?).


v27-0003 & v27-0004: I did not get the exact reason we introduced
`identify_and_fix_vm_corruption` in 0003 and moved code in 0004 to
another place. I can see we have this starting v25 of patch set. Well,
maybe this is not an issue at all...


in v27-0005. This patch changes code which is not exercised in
tests[0]. I spent some time understanding the conditions when we
entered this. There is a comment about non-finished relation
extension, but I got no success trying to reproduce this. I ended up
modifying code to lose PageSetAllVisible in proper places and running
vacuum. Looks like everything works as expected. I will spend some
more time on this, maybe I will be successful in writing an
injection-point-based TAP test which hits this...



[0] https://coverage.postgresql.org/src/backend/access/heap/vacuumlazy.c.gcov.html#1902
-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-18 00:30  Melanie Plageman <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-12-18 00:30 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Chao Li <[email protected]>

Thanks for the review!

In addition to addressing your feedback, attached v28 includes a
number of small fixes to comments, commit messages, and other things.
Notably, I've added one new refactoring patch 0009, which reduces the
diff of 0010 -- using the GlobalVisState instead of OldestXmin for
page visibility -- even further.

On Wed, Dec 17, 2025 at 1:27 PM Kirill Reshke <[email protected]> wrote:
>
> > I've done this. I've actually added three such verifications -- one
> > after each step where the VM is expected to change. It shouldn't be
> > very expensive, so I think it is okay. The way the test would fail if
> > the buffer wasn't correctly dirtied is that it would assert out -- so
> > the visibility map test wouldn't even have a chance to fail. But, I
> > think it is also okay to confirm that the expected things are
> > happening with the VM -- it just gives us extra coverage.
>
> +1 on extra coverage. Should we also do sql-level check that the VM
> indeed does not need to set PD_ALL_VISIBLE (check header bytes using
> pageinspect?).

That's an interesting idea. I checked and, AFAICT, there are no tests
currently directly comparing the flags column returned by the
pageinspect page_header() function to one of the flag values. I've
added the following to attached v28.

SELECT (flags & x'0004'::int) <> 0
        FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));

But I'm not sure if it is weird/confusing to be comparing the flag
directly to the number 4 like this. I don't really want to bother with
adding another function to pageinspect returning the status of
PD_ALL_VISIBLE (like page_visible() or something).

> v27-0003 & v27-0004: I did not get the exact reason we introduced
> `identify_and_fix_vm_corruption` in 0003 and moved code in 0004 to
> another place. I can see we have this starting v25 of patch set. Well,
> maybe this is not an issue at all...

It's mostly for ease of review. This is a pretty sensitive area of
code, so I thought it would be easier for the reviewer to confirm
correctness if I split it up. Andres had mentioned that the commit was
hard to review because so many different things were happening.

In v27, 0003 moves the VM clear code into a helper. 0004 and 0005
moves all the VM setting/clearing code to
heap_page_prune_and_freeze(). And 0006 actually sets the VM in the
same critical section as pruning/freezing and emits a single WAL
record.

I'm not really sure which commits should stay independent in the final
version I push to master.

> in v27-0005. This patch changes code which is not exercised in
> tests[0]. I spent some time understanding the conditions when we
> entered this. There is a comment about non-finished relation
> extension, but I got no success trying to reproduce this. I ended up
> modifying code to lose PageSetAllVisible in proper places and running
> vacuum. Looks like everything works as expected. I will spend some
> more time on this, maybe I will be successful in writing an
> injection-point-based TAP test which hits this...

Based on the coverage report link you provided, that code is changed
by v27 0007, not 0005. 0005 is about moving an assertion out of
lazy_scan_prune(). 0007 changes lazy_scan_new_or_empty() (the code in
question).

Regarding 0007, it looks like what is uncovered (the orange bits in
the coverage report are uncovered, I assume) is empty pages _without_
PD_ALL_VISIBLE set. I don't see anywhere where PageSetAllVisible() is
called except vacuum and COPY FREEZE.

If I was trying to guess how empty pages with PD_ALL_VISIBLE set are
getting vacuumed, I would think it is due to SKIP_PAGES_THRESHOLD
causing us to vacuum an all-frozen empty page.

Then the question is, why wouldn't we have coverage of the empty page
first being set all-visible/all-frozen? It can't be COPY FREEZE
because the page is empty. And it can't be vacuum, because then we
would have coverage. It's very mysterious.

It would be good to have coverage for this case. I don't think you'll
need an injection point for the main case of "empty page not yet set
all-visible is vacuumed for the first time" (unless I'm
misunderstanding something).

I'm not sure how you'll test the "vacuuming an empty, previously
uninitialized page" case described in this comment, though.

             * It's possible that another backend has extended the heap,
             * initialized the page, and then failed to WAL-log the page due
             * to an ERROR.  Since heap extension is not WAL-logged, recovery
             * might try to replay our record setting the page all-visible and
             * find that the page isn't initialized, which will cause a PANIC.
             * To prevent that, check whether the page has been previously
             * WAL-logged, and if not, do that now.

You'd want to force an error during relation extension and then vacuum
the page. I don't know if you need an injection point to force the
error -- depends on what kind of error, I think.

So that I know for attribution, did you review 0003-0005?

- Melanie


Attachments:

  [text/x-patch] v28-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch (10.0K, 2-v28-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch)
  download | inline diff:
From ee7c5f860799f195644e2fedf2b63b6789045cbc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v28 01/15] Combine visibilitymap_set() cases in
 lazy_scan_prune()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).

In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().

Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would assert out
anyway.

Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.

This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Srinath Reddy Sadipiralla <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
 .../pg_visibility/expected/pg_visibility.out  | 44 ++++++++++
 contrib/pg_visibility/sql/pg_visibility.sql   | 20 +++++
 src/backend/access/heap/vacuumlazy.c          | 87 ++++---------------
 3 files changed, 82 insertions(+), 69 deletions(-)

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..e10f1706015 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -1,4 +1,5 @@
 CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
 --
 -- recently-dropped table
 --
@@ -204,6 +205,49 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (1,1)
+(1 row)
+
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map 
+----------------------------
+ 
+(1 row)
+
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (0,0)
+(1 row)
+
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+        FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+ ?column? 
+----------
+ t
+(1 row)
+
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (1,1)
+(1 row)
+
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 -- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..57af8a0c5b6 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -1,4 +1,5 @@
 CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
 
 --
 -- recently-dropped table
@@ -94,6 +95,25 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+        FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 30778a15639..cecba2146ea 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2093,16 +2093,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
 	 * all_frozen variables
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.all_visible && !all_visible_according_to_vm) ||
+		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
 		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
 			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
 
 		/*
 		 * It should never be the case that the visibility map page is set
@@ -2110,15 +2108,25 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+		 * unnecessarily dirtying the heap buffer. Nearly the only scenario
+		 * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
+		 * removed -- and that isn't worth optimizing for. And if we add the
+		 * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
+		 * it must be marked dirty.
 		 */
 		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
+
+		/*
+		 * If the page is being set all-frozen, we pass InvalidTransactionId
+		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+		 * make everything safe for REDO was logged when the page's tuples
+		 * were frozen.
+		 */
+		Assert(!presult.all_frozen ||
+			   !TransactionIdIsValid(presult.vm_conflict_horizon));
+
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
@@ -2190,65 +2198,6 @@ lazy_scan_prune(LVRelState *vacrel,
 							VISIBILITYMAP_VALID_BITS);
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v28-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch (13.6K, 3-v28-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch)
  download | inline diff:
From 0dac7060ae0eddc2617a1919150757a7e63924f3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 11 Dec 2025 10:48:13 -0500
Subject: [PATCH v28 02/15] Eliminate use of cached VM value in
 lazy_scan_prune()

lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a365, when we did not pin the
VM page until confirming it was not all-visible. Now that the VM page is
already pinned, there is no meaningful benefit to relying on a cached VM
status.

Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available which would make the logic harder to reason about. Eliminating
it also enables us to detect and repair VM corruption on-access.

Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible that after
fixing corruption, the VM could be newly set, if pruning found the page
all-visible.

Author: Melanie Plageman <[email protected]>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 178 ++++++++++++---------------
 1 file changed, 79 insertions(+), 99 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cecba2146ea..9d2523a55b2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -248,13 +248,6 @@ typedef enum
  */
 #define EAGER_SCAN_REGION_SIZE 4096
 
-/*
- * heap_vac_scan_next_block() sets these flags to communicate information
- * about the block it read to the caller.
- */
-#define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
-#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
-
 typedef struct LVRelState
 {
 	/* Target heap relation and its indexes */
@@ -358,7 +351,6 @@ typedef struct LVRelState
 	/* State maintained by heap_vac_scan_next_block() */
 	BlockNumber current_block;	/* last block returned */
 	BlockNumber next_unskippable_block; /* next unskippable block */
-	bool		next_unskippable_allvis;	/* its visibility status */
 	bool		next_unskippable_eager_scanned; /* if it was eagerly scanned */
 	Buffer		next_unskippable_vmbuffer;	/* buffer containing its VM bit */
 
@@ -432,7 +424,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							Buffer vmbuffer, bool all_visible_according_to_vm,
+							Buffer vmbuffer,
 							bool *has_lpdead_items, bool *vm_page_frozen);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
@@ -1248,7 +1240,6 @@ lazy_scan_heap(LVRelState *vacrel)
 	/* Initialize for the first heap_vac_scan_next_block() call */
 	vacrel->current_block = InvalidBlockNumber;
 	vacrel->next_unskippable_block = InvalidBlockNumber;
-	vacrel->next_unskippable_allvis = false;
 	vacrel->next_unskippable_eager_scanned = false;
 	vacrel->next_unskippable_vmbuffer = InvalidBuffer;
 
@@ -1264,13 +1255,13 @@ lazy_scan_heap(LVRelState *vacrel)
 										MAIN_FORKNUM,
 										heap_vac_scan_next_block,
 										vacrel,
-										sizeof(uint8));
+										sizeof(bool));
 
 	while (true)
 	{
 		Buffer		buf;
 		Page		page;
-		uint8		blk_info = 0;
+		bool		was_eager_scanned = false;
 		int			ndeleted = 0;
 		bool		has_lpdead_items;
 		void	   *per_buffer_data = NULL;
@@ -1339,13 +1330,13 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (!BufferIsValid(buf))
 			break;
 
-		blk_info = *((uint8 *) per_buffer_data);
+		was_eager_scanned = *((bool *) per_buffer_data);
 		CheckBufferIsPinnedOnce(buf);
 		page = BufferGetPage(buf);
 		blkno = BufferGetBlockNumber(buf);
 
 		vacrel->scanned_pages++;
-		if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
+		if (was_eager_scanned)
 			vacrel->eager_scanned_pages++;
 
 		/* Report as block scanned, update error traceback information */
@@ -1416,7 +1407,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (got_cleanup_lock)
 			ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
 									   vmbuffer,
-									   blk_info & VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM,
 									   &has_lpdead_items, &vm_page_frozen);
 
 		/*
@@ -1433,8 +1423,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * exclude pages skipped due to cleanup lock contention from eager
 		 * freeze algorithm caps.
 		 */
-		if (got_cleanup_lock &&
-			(blk_info & VAC_BLK_WAS_EAGER_SCANNED))
+		if (got_cleanup_lock && was_eager_scanned)
 		{
 			/* Aggressive vacuums do not eager scan. */
 			Assert(!vacrel->aggressive);
@@ -1601,7 +1590,6 @@ heap_vac_scan_next_block(ReadStream *stream,
 {
 	BlockNumber next_block;
 	LVRelState *vacrel = callback_private_data;
-	uint8		blk_info = 0;
 
 	/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
 	next_block = vacrel->current_block + 1;
@@ -1664,8 +1652,8 @@ heap_vac_scan_next_block(ReadStream *stream,
 		 * otherwise they would've been unskippable.
 		 */
 		vacrel->current_block = next_block;
-		blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
-		*((uint8 *) per_buffer_data) = blk_info;
+		/* Block was not eager scanned */
+		*((bool *) per_buffer_data) = false;
 		return vacrel->current_block;
 	}
 	else
@@ -1677,11 +1665,7 @@ heap_vac_scan_next_block(ReadStream *stream,
 		Assert(next_block == vacrel->next_unskippable_block);
 
 		vacrel->current_block = next_block;
-		if (vacrel->next_unskippable_allvis)
-			blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
-		if (vacrel->next_unskippable_eager_scanned)
-			blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
-		*((uint8 *) per_buffer_data) = blk_info;
+		*((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
 		return vacrel->current_block;
 	}
 }
@@ -1706,7 +1690,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 	BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
 	Buffer		next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
 	bool		next_unskippable_eager_scanned = false;
-	bool		next_unskippable_allvis;
 
 	*skipsallvis = false;
 
@@ -1716,7 +1699,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 													   next_unskippable_block,
 													   &next_unskippable_vmbuffer);
 
-		next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
 
 		/*
 		 * At the start of each eager scan region, normal vacuums with eager
@@ -1735,7 +1717,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 		 * A block is unskippable if it is not all visible according to the
 		 * visibility map.
 		 */
-		if (!next_unskippable_allvis)
+		if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
 		{
 			Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
 			break;
@@ -1792,7 +1774,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 
 	/* write the local variables back to vacrel */
 	vacrel->next_unskippable_block = next_unskippable_block;
-	vacrel->next_unskippable_allvis = next_unskippable_allvis;
 	vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
 	vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
 }
@@ -1953,9 +1934,7 @@ cmpOffsetNumbers(const void *a, const void *b)
  * Caller must hold pin and buffer cleanup lock on the buffer.
  *
  * vmbuffer is the buffer containing the VM block with visibility information
- * for the heap block, blkno. all_visible_according_to_vm is the saved
- * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * for the heap block, blkno.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1972,7 +1951,6 @@ lazy_scan_prune(LVRelState *vacrel,
 				BlockNumber blkno,
 				Page page,
 				Buffer vmbuffer,
-				bool all_visible_according_to_vm,
 				bool *has_lpdead_items,
 				bool *vm_page_frozen)
 {
@@ -1986,6 +1964,8 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
+	uint8		old_vmbits = 0;
+	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2088,70 +2068,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 	Assert(!presult.all_frozen || presult.all_visible);
 
-	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
-	 */
-	if ((presult.all_visible && !all_visible_according_to_vm) ||
-		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
-		 * unnecessarily dirtying the heap buffer. Nearly the only scenario
-		 * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
-		 * removed -- and that isn't worth optimizing for. And if we add the
-		 * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
-		 * it must be marked dirty.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-
-		/*
-		 * If the page is being set all-frozen, we pass InvalidTransactionId
-		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
-		 * make everything safe for REDO was logged when the page's tuples
-		 * were frozen.
-		 */
-		Assert(!presult.all_frozen ||
-			   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
+	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
 
 	/*
 	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
@@ -2159,8 +2076,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
 	 * with buffer lock before concluding that the VM is corrupt.
 	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	if (!PageIsAllVisible(page) &&
+		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
 		ereport(WARNING,
 				(errcode(ERRCODE_DATA_CORRUPTED),
@@ -2198,6 +2115,69 @@ lazy_scan_prune(LVRelState *vacrel,
 							VISIBILITYMAP_VALID_BITS);
 	}
 
+	if (!presult.all_visible)
+		return presult.ndeleted;
+
+	/* Set the visibility map and page visibility hint */
+	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (presult.all_frozen)
+		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	/* Nothing to do */
+	if (old_vmbits == new_vmbits)
+		return presult.ndeleted;
+
+	Assert(presult.all_visible);
+
+	/*
+	 * It should never be the case that the visibility map page is set while
+	 * the page-level bit is clear, but the reverse is allowed (if checksums
+	 * are not enabled).  Regardless, set both bits so that we get back in
+	 * sync.
+	 *
+	 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+	 * unnecessarily dirtying the heap buffer. Nearly the only scenario where
+	 * PD_ALL_VISIBLE is set but the VM is not is if the VM was removed -- and
+	 * that isn't worth optimizing for. And if we add the heap buffer to the
+	 * WAL chain (without passing REGBUF_NO_CHANGES), it must be marked dirty.
+	 */
+	PageSetAllVisible(page);
+	MarkBufferDirty(buf);
+
+	/*
+	 * If the page is being set all-frozen, we pass InvalidTransactionId as
+	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
+	 * everything safe for REDO was logged when the page's tuples were frozen.
+	 */
+	Assert(!presult.all_frozen ||
+		   !TransactionIdIsValid(presult.vm_conflict_horizon));
+
+	visibilitymap_set(vacrel->rel, blkno, buf,
+					  InvalidXLogRecPtr,
+					  vmbuffer, presult.vm_conflict_horizon,
+					  new_vmbits);
+
+	/*
+	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
+	 * count it as newly set for logging.
+	 */
+	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+	{
+		vacrel->vm_new_visible_pages++;
+		if (presult.all_frozen)
+		{
+			vacrel->vm_new_visible_frozen_pages++;
+			*vm_page_frozen = true;
+		}
+	}
+	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 presult.all_frozen)
+	{
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
+	}
+
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v28-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch (6.5K, 4-v28-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch)
  download | inline diff:
From c23840a12ff14eeffb5116c2cfd34e34e3987b02 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v28 03/15] Refactor lazy_scan_prune() VM clear logic into
 helper

Encapsulating them in a helper makes the whole function clearer. There
is no functional change other than moving it into a helper.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 122 +++++++++++++++++----------
 1 file changed, 78 insertions(+), 44 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9d2523a55b2..ff34a99edbd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,6 +422,11 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+										   BlockNumber heap_blk, Page heap_page,
+										   int nlpdead_items,
+										   Buffer vmbuffer,
+										   uint8 vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1928,6 +1933,77 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ */
+static void
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+							   BlockNumber heap_blk, Page heap_page,
+							   int nlpdead_items,
+							   Buffer vmbuffer,
+							   uint8 vmbits)
+{
+	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (!PageIsAllVisible(heap_page) &&
+		((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+}
+
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2070,50 +2146,8 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(page) &&
-		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
+	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+								   presult.lpdead_items, vmbuffer, old_vmbits);
 
 	if (!presult.all_visible)
 		return presult.ndeleted;
-- 
2.43.0



  [text/x-patch] v28-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch (26.4K, 5-v28-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch)
  download | inline diff:
From 014abb83438cf3a3600f34b1060bca430f572275 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v28 04/15] Set the VM in heap_page_prune_and_freeze()

This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
---
 src/backend/access/heap/pruneheap.c  | 302 +++++++++++++++++++++++----
 src/backend/access/heap/vacuumlazy.c | 142 +------------
 src/include/access/heapam.h          |  21 ++
 3 files changed, 286 insertions(+), 179 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 07aa08cfe14..14d40476be9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+										   BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+										   Buffer vmbuffer,
+										   uint8 vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+								  Relation relation,
+								  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+								  Buffer vmbuffer,
+								  int nlpdead_items,
+								  uint8 *old_vmbits,
+								  uint8 *new_vmbits);
 
 
 /*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeParams params = {
 				.relation = relation,
 				.buffer = buffer,
+				.vmbuffer = InvalidBuffer,
 				.reason = PRUNE_ON_ACCESS,
 				.options = 0,
 				.vistest = vistest,
@@ -341,6 +356,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 
 	/*
@@ -396,51 +413,54 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
-	 * that after scanning the line pointers. We must correct all_visible and
-	 * all_frozen before we return them to the caller, so that the caller
-	 * doesn't set the VM bits incorrectly.
+	 * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+	 * them after deciding whether to freeze, but before updating the VM, to
+	 * avoid setting the VM bits incorrectly.
+	 *
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate->attempt_freeze)
 	{
 		prstate->all_visible = true;
 		prstate->all_frozen = true;
 	}
+	else if (prstate->attempt_update_vm)
+	{
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate->all_visible = false;
 		prstate->all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -775,10 +795,134 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ */
+static void
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+							   BlockNumber heap_blk, Page heap_page,
+							   int nlpdead_items,
+							   Buffer vmbuffer,
+							   uint8 vmbits)
+{
+	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (!PageIsAllVisible(heap_page) &&
+		((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+					  Relation relation,
+					  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+					  Buffer vmbuffer,
+					  int nlpdead_items,
+					  uint8 *old_vmbits,
+					  uint8 *new_vmbits)
+{
+	if (!prstate->attempt_update_vm)
+		return false;
+
+	*old_vmbits = visibilitymap_get_status(relation, heap_blk,
+										   &vmbuffer);
+
+	/* We do this even if not all-visible */
+	identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+								   nlpdead_items, vmbuffer,
+								   *old_vmbits);
+
+	if (!prstate->all_visible)
+		return false;
+
+	*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->all_frozen)
+		*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (*new_vmbits == *old_vmbits)
+	{
+		*new_vmbits = 0;
+		return false;
+	}
+
+	return true;
+}
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -793,12 +937,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -823,13 +968,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
+
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1011,6 +1162,65 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
 		}
 	}
+
+	/* Now update the visibility map and PD_ALL_VISIBLE hint */
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate,
+									  params->relation,
+									  blockno,
+									  buffer,
+									  page,
+									  vmbuffer,
+									  prstate.lpdead_items,
+									  &old_vmbits,
+									  &new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	/* Set the visibility map and page visibility hint, if relevant */
+	if (do_set_vm)
+	{
+		Assert(prstate.all_visible);
+
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed (if
+		 * checksums are not enabled).  Regardless, set both bits so that we
+		 * get back in sync.
+		 *
+		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+		 * unnecessarily dirtying the heap buffer. Nearly the only scenario
+		 * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
+		 * removed -- and that isn't worth optimizing for. And if we add the
+		 * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
+		 * it must be marked dirty.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buffer);
+
+		/*
+		 * If the page is being set all-frozen, we pass InvalidTransactionId
+		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+		 * make everything safe for REDO was logged when the page's tuples
+		 * were frozen.
+		 */
+		Assert(!prstate.all_frozen ||
+			   !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+		visibilitymap_set(params->relation, blockno, buffer,
+						  InvalidXLogRecPtr,
+						  vmbuffer, presult->vm_conflict_horizon,
+						  new_vmbits);
+	}
+
+	/* Save the vmbits for caller */
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = new_vmbits;
 }
 
 
@@ -1485,6 +1695,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			{
 				TransactionId xmin;
 
+				Assert(prstate->attempt_update_vm);
+
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ff34a99edbd..d5c57516785 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,11 +422,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1933,77 +1928,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on an heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2035,13 +1959,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2141,73 +2064,24 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer, old_vmbits);
-
-	if (!presult.all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
-	if (presult.all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	Assert(presult.all_visible);
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear, but the reverse is allowed (if checksums
-	 * are not enabled).  Regardless, set both bits so that we get back in
-	 * sync.
-	 *
-	 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
-	 * unnecessarily dirtying the heap buffer. Nearly the only scenario where
-	 * PD_ALL_VISIBLE is set but the VM is not is if the VM was removed -- and
-	 * that isn't worth optimizing for. And if we add the heap buffer to the
-	 * WAL chain (without passing REGBUF_NO_CHANGES), it must be marked dirty.
-	 */
-	PageSetAllVisible(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
 	/*
 	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
 	 * count it as newly set for logging.
 	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
 		vacrel->vm_new_visible_pages++;
-		if (presult.all_frozen)
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
 			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.all_frozen)
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
 		vacrel->vm_new_frozen_pages++;
 		*vm_page_frozen = true;
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f7e4ae3843c..ad2af13ec39 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * If we will consider updating the visibility map, vmbuffer should
+	 * contain the correct block of the visibility map and be pinned.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
 	 *
 	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
 	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 *
+	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+	 * in the VM.
 	 */
 	int			options;
 
@@ -299,6 +309,17 @@ typedef struct PruneFreezeResult
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
+	 *
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
+	 * we have attempted to update the VM.
+	 */
+	uint8		new_vmbits;
+	uint8		old_vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v28-0005-Move-VM-assert-into-prune-freeze-code.patch (10.9K, 6-v28-0005-Move-VM-assert-into-prune-freeze-code.patch)
  download | inline diff:
From 218bcd4dffe014647495a9bba11d8beaeb1465cd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v28 05/15] Move VM assert into prune/freeze code

This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
---
 src/backend/access/heap/pruneheap.c  | 86 ++++++++++++++++++++++------
 src/backend/access/heap/vacuumlazy.c | 68 +---------------------
 src/include/access/heapam.h          | 25 +++-----
 3 files changed, 77 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 14d40476be9..39149fbba7c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -918,6 +918,31 @@ heap_page_will_set_vm(PruneState *prstate,
 	return true;
 }
 
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+#endif
+
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -971,6 +996,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -1129,23 +1155,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
@@ -1163,6 +1174,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		}
 	}
 
+	/*
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so we don't need to again.
+	 */
+	if (prstate.all_frozen)
+		vm_conflict_horizon = InvalidTransactionId;
+	else
+		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(presult->lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(params->relation, buffer,
+										prstate.cutoffs->OldestXmin,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == vm_conflict_horizon);
+	}
+#endif
+
 	/* Now update the visibility map and PD_ALL_VISIBLE hint */
 	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
@@ -1209,12 +1260,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * make everything safe for REDO was logged when the page's tuples
 		 * were frozen.
 		 */
-		Assert(!prstate.all_frozen ||
-			   !TransactionIdIsValid(presult->vm_conflict_horizon));
+		Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
 
 		visibilitymap_set(params->relation, blockno, buffer,
 						  InvalidXLogRecPtr,
-						  vmbuffer, presult->vm_conflict_horizon,
+						  vmbuffer, vm_conflict_horizon,
 						  new_vmbits);
 	}
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d5c57516785..61564aea5fd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -456,20 +456,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
-										   OffsetNumber *deadoffsets,
-										   int ndeadoffsets,
-										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
-										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2005,32 +1991,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -3487,29 +3447,6 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
-{
-
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
-										  NULL, 0,
-										  all_frozen,
-										  visibility_cutoff_xid,
-										  logging_offnum);
-}
-#endif
 
 /*
  * Check whether the heap page in buf is all-visible except for the dead
@@ -3533,15 +3470,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
  * This logic is closely related to heap_prune_record_unchanged_lp_normal().
  * If you modify this function, ensure consistency with that code. An
  * assertion cross-checks that both remain in agreement. Do not introduce new
  * side-effects.
  */
-static bool
+bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ad2af13ec39..bec2f840102 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
 	 *
 	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
 	 * in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
 	int			live_tuples;
 	int			recently_dead_tuples;
 
-	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
-	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
-
 	/*
 	 * old_vmbits are the state of the all-visible and all-frozen bits in the
 	 * visibility map before updating it during phase I of vacuuming.
@@ -454,6 +438,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v28-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (14.4K, 7-v28-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From beb1ae3557904962dcec6266b882cdc75a0c7051 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v28 06/15] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c | 277 ++++++++++++++++------------
 1 file changed, 158 insertions(+), 119 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 39149fbba7c..3521e70b8d0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
 								  int nlpdead_items,
 								  uint8 *old_vmbits,
 								  uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  uint8 old_vmbits, uint8 new_vmbits,
+									  TransactionId latest_xid_removed,
+									  TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid);
 
 
 /*
@@ -795,6 +800,68 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 uint8 old_vmbits, uint8 new_vmbits,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid)
+{
+	TransactionId conflict_xid;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune &&
+		!do_freeze &&
+		do_set_vm &&
+		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+		(new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	return conflict_xid;
+}
+
 /*
  * Helper to correct any corruption detected on an heap page and its
  * corresponding visibility map page after pruning but before setting the
@@ -996,7 +1063,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -1004,10 +1070,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
 	uint8		new_vmbits = 0;
 	uint8		old_vmbits = 0;
 
-
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
@@ -1068,6 +1134,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the VM bits based on information from the VM and
+	 * the all_visible/all_frozen flags.
+	 */
+	do_set_vm = heap_page_will_set_vm(&prstate,
+									  params->relation,
+									  blockno,
+									  buffer,
+									  page,
+									  vmbuffer,
+									  prstate.lpdead_items,
+									  &old_vmbits,
+									  &new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									old_vmbits, new_vmbits,
+									prstate.latest_xid_removed,
+									prstate.frz_conflict_horizon,
+									prstate.visibility_cutoff_xid);
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1089,14 +1186,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1110,6 +1210,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * Even if we are only setting the VM and PD_ALL_VISIBLE is
+			 * already set, we don't need to worry about unnecessarily
+			 * dirtying the heap buffer below. Nearly the only scenario where
+			 * PD_ALL_VISIBLE is set but the VM is not is if the VM was
+			 * removed, and that isn't worth optimizing for. And, if we add
+			 * the heap buffer to the WAL chain (without passing
+			 * REGBUF_NO_CHANGES), it must be marked dirty.
+			 */
+			PageSetAllVisible(page);
+			visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+									 params->relation->rd_locator);
+		}
+
 		MarkBufferDirty(buffer);
 
 		/*
@@ -1117,29 +1238,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(params->relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1149,43 +1253,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->hastup = prstate.hastup;
-
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
-	if (prstate.attempt_freeze)
-	{
-		if (presult->nfrozen > 0)
-		{
-			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
-		}
-		else
-		{
-			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
-		}
-	}
-
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so we don't need to again.
-	 */
-	if (prstate.all_frozen)
-		vm_conflict_horizon = InvalidTransactionId;
-	else
-		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 	/*
 	 * During its second pass over the heap, VACUUM calls
@@ -1200,7 +1269,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(presult->lpdead_items == 0);
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
 										prstate.cutoffs->OldestXmin,
@@ -1210,67 +1280,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		Assert(prstate.all_frozen == debug_all_frozen);
 
 		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == vm_conflict_horizon);
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
 #endif
 
-	/* Now update the visibility map and PD_ALL_VISIBLE hint */
-	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
-	do_set_vm = heap_page_will_set_vm(&prstate,
-									  params->relation,
-									  blockno,
-									  buffer,
-									  page,
-									  vmbuffer,
-									  prstate.lpdead_items,
-									  &old_vmbits,
-									  &new_vmbits);
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->hastup = prstate.hastup;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
-	/*
-	 * new_vmbits should be 0 regardless of whether or not the page is
-	 * all-visible if we do not intend to set the VM.
-	 */
-	Assert(do_set_vm || new_vmbits == 0);
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
 
-	/* Set the visibility map and page visibility hint, if relevant */
-	if (do_set_vm)
+	if (prstate.attempt_freeze)
 	{
-		Assert(prstate.all_visible);
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
-		 * unnecessarily dirtying the heap buffer. Nearly the only scenario
-		 * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
-		 * removed -- and that isn't worth optimizing for. And if we add the
-		 * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
-		 * it must be marked dirty.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buffer);
-
-		/*
-		 * If the page is being set all-frozen, we pass InvalidTransactionId
-		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
-		 * make everything safe for REDO was logged when the page's tuples
-		 * were frozen.
-		 */
-		Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
-		visibilitymap_set(params->relation, blockno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, vm_conflict_horizon,
-						  new_vmbits);
+		if (presult->nfrozen > 0)
+		{
+			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+		}
+		else
+		{
+			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+		}
 	}
-
-	/* Save the vmbits for caller */
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = new_vmbits;
 }
 
 
-- 
2.43.0



  [text/x-patch] v28-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 8-v28-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 3d8444447657a04513e044b4261b5d6334f1bef7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v28 07/15] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 61564aea5fd..e311e7d6604 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1866,9 +1866,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1885,13 +1888,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v28-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (24.7K, 9-v28-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 1acb98f855c1c28993bbd9ccb90a2250e4d64980 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v28 08/15] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 112 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 373 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6daf4a87dec..fb7a7548aa0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8813,50 +8813,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1823feff298..47d2479415e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3521e70b8d0..86de3613f5e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1227,8 +1227,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * REGBUF_NO_CHANGES), it must be marked dirty.
 			 */
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
-									 params->relation->rd_locator);
+			visibilitymap_set(blockno, vmbuffer, new_vmbits,
+							  params->relation->rd_locator);
 		}
 
 		MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e311e7d6604..9dec4875e3a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1888,11 +1888,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2769,9 +2769,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 * set PD_ALL_VISIBLE.
 		 */
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e15cb1825e..c0cac7ea1c3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..b27fcdfb345 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 04845d5e680..6505628120c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4331,7 +4331,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v28-0009-Simplify-heap_page_would_be_all_visible-visibili.patch (2.2K, 10-v28-0009-Simplify-heap_page_would_be_all_visible-visibili.patch)
  download | inline diff:
From 8ba420f2ac77650d22905ba7b4660dc70dad9383 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 13:57:16 -0500
Subject: [PATCH v28 09/15] Simplify heap_page_would_be_all_visible visibility
 check

heap_page_would_be_all_visible() doesn't care about the distinction
between HEAPTUPLE_RECENTLY_DEAD and HEAPTUPLE_DEAD tuples -- any tuple
that is not HEAPTUPLE_LIVE means the page is not all-visible and causes
us to return false.

Therefore, we don't need to call HeapTupleSatisfiesVacuum(), which
includes an extra step to distinguish between dead and recently dead
tuples using OldestXmin. Replace it with the more minimal
HeapTupleSatisfiesVacuumHorizon().

This has the added benefit of making it easier to replace uses of
OldestXmin in heap_page_would_be_all_visible() in the future.
---
 src/backend/access/heap/vacuumlazy.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9dec4875e3a..441b4883d89 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3535,6 +3535,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	{
 		ItemId		itemid;
 		HeapTupleData tuple;
+		TransactionId dead_after = InvalidTransactionId;
 
 		/*
 		 * Set the offset number so that we can display it along with any
@@ -3574,12 +3575,14 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 		/* Visibility checks may do IO or allocate memory */
 		Assert(CritSectionCount == 0);
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumHorizon(&tuple, buf, &dead_after))
 		{
 			case HEAPTUPLE_LIVE:
 				{
 					TransactionId xmin;
 
+					Assert(!TransactionIdIsValid(dead_after));
+
 					/* Check comments in lazy_scan_prune. */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
@@ -3614,6 +3617,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 			case HEAPTUPLE_DEAD:
 			case HEAPTUPLE_RECENTLY_DEAD:
+				Assert(TransactionIdIsValid(dead_after));
+				/* FALLTHROUGH */
+
 			case HEAPTUPLE_INSERT_IN_PROGRESS:
 			case HEAPTUPLE_DELETE_IN_PROGRESS:
 				{
-- 
2.43.0



  [text/x-patch] v28-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.9K, 11-v28-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 7af3b6f670fd8e4d0bc2141d8d11d54696bc459c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v28 10/15] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 53 ++++++++++-----------
 src/backend/access/heap/vacuumlazy.c        | 38 ++++++++++-----
 src/include/access/heapam.h                 |  4 +-
 4 files changed, 76 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..f4ab1c13169 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 86de3613f5e..e0b19b3e669 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -461,11 +461,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -994,14 +995,14 @@ heap_page_will_set_vm(PruneState *prstate,
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -1088,6 +1089,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prune_freeze_plan(RelationGetRelid(params->relation),
 					  buffer, &prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1270,10 +1281,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
-										prstate.cutoffs->OldestXmin,
+										prstate.vistest,
 										&debug_all_frozen,
 										&debug_cutoff, off_loc));
 
@@ -1794,28 +1804,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still consider the newest xid on
+				 * the page to be running. If so, we don't consider the page
+				 * all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisTestIsRemovableXid
-				 * instead, if a non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 441b4883d89..082cdbc5de8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2723,7 +2723,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3484,7 +3484,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3500,7 +3500,7 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3583,7 +3583,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 					Assert(!TransactionIdIsValid(dead_after));
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3592,16 +3592,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3633,6 +3634,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bec2f840102..1625b107575 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -439,7 +439,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -453,6 +453,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v28-0011-Unset-all_visible-sooner-if-not-freezing.patch (2.5K, 12-v28-0011-Unset-all_visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 11d439fcc4ae35a31be204c2eb8a36d52c162d08 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v28 11/15] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e0b19b3e669..1fa72e19f0d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1669,8 +1669,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	/*
 	 * Deliberately delay unsetting all_visible and all_frozen until later
 	 * during pruning. Removable dead tuples shouldn't preclude freezing the
-	 * page.
+	 * page. If we won't attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1930,8 +1935,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible and all_frozen until later, at the
 	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
 	 * freeze the page after pruning.  As long as we unset it before updating
-	 * the visibility map, this will be correct.
+	 * the visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v28-0012-Track-which-relations-are-modified-by-a-query.patch (2.5K, 13-v28-0012-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From afc918c60fd21be6339d88aa4024f908daeab8d3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v28 12/15] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
---
 src/backend/executor/execMain.c  | 4 ++++
 src/backend/executor/execUtils.c | 2 ++
 src/include/nodes/execnodes.h    | 6 ++++++
 3 files changed, 12 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 797d8b1ca1c..5b299ef81aa 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3968429f991..d8c385216e0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v28-0013-Pass-down-information-on-table-modification-to-s.patch (23.7K, 14-v28-0013-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From a64995852e82d8132f23c1bb361e0f3c8080389a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v28 13/15] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  7 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 +++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +++----
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++----
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      | 11 ++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              | 19 ++++++++++-------
 23 files changed, 93 insertions(+), 46 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 26cb75058d1..4ad8941c60a 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index df30dcc0228..aaa5401b731 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..6c2e4e08b16 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index b7f10a1aed0..15f9cc11582 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
 	}
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index d7695dc1108..7bdbc7e5fa7 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 73ebc01a08f..a00bdfdf822 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index dae91630ac3..1957bb0f1a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b1a00ed477..130a670d266 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6377,7 +6377,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13766,7 +13766,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22623,7 +22623,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23087,7 +23087,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index be6ffd6ddb0..2921f68c1c3 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0b3a31f1703..74262a34819 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 860f79f9cc1..6e49ea5c5d8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 6bea42f128f..2c87ba5f767 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 72b135e5dcf..92674441c6d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 16b0adc172c..91acf1ee2d7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c760b19db55..ec0def0d1e2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 1625b107575..0bfe2366e1a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
 
 	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
 	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d10b1b03cdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the rel */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1164,9 +1167,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v28-0014-Allow-on-access-pruning-to-set-pages-all-visible.patch (11.0K, 15-v28-0014-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 28ccc95ff7620ef07aa3f225d3a7c05aa8c05909 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v28 14/15] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 15 ++++++-
 src/backend/access/heap/heapam_handler.c      | 15 ++++++-
 src/backend/access/heap/pruneheap.c           | 40 ++++++++++++++++++-
 src/include/access/heapam.h                   | 24 +++++++++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 5 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb7a7548aa0..d9dc79f4a96 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6c2e4e08b16..2cb98e58956 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2472,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2518,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1fa72e19f0d..c9821a5830c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
 								  Relation relation,
 								  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
 								  Buffer vmbuffer,
+								  PruneReason reason,
+								  bool do_prune, bool do_freeze,
 								  int nlpdead_items,
 								  uint8 *old_vmbits,
 								  uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 				.cutoffs = NULL,
 			};
 
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
 
@@ -945,6 +958,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
  * corrupted, it will fix them by clearing the VM bits and visibility hint.
  * This does not need to be done in a critical section.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with returning the
  * current value of the VM bits in *old_vmbits and the desired new value of
  * the VM bits in *new_vmbits.
@@ -954,6 +970,8 @@ heap_page_will_set_vm(PruneState *prstate,
 					  Relation relation,
 					  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
 					  Buffer vmbuffer,
+					  PruneReason reason,
+					  bool do_prune, bool do_freeze,
 					  int nlpdead_items,
 					  uint8 *old_vmbits,
 					  uint8 *new_vmbits)
@@ -961,6 +979,24 @@ heap_page_will_set_vm(PruneState *prstate,
 	if (!prstate->attempt_update_vm)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+		return false;
+	}
+
 	*old_vmbits = visibilitymap_get_status(relation, heap_blk,
 										   &vmbuffer);
 
@@ -1157,6 +1193,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  buffer,
 									  page,
 									  vmbuffer,
+									  params->reason,
+									  do_prune, do_freeze,
 									  prstate.lpdead_items,
 									  &old_vmbits,
 									  &new_vmbits);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0bfe2366e1a..3328f56c101 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 
 	/*
 	 * Some optimizations can only be performed if the query does not modify
@@ -420,7 +437,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v28-0015-Set-pd_prune_xid-on-insert.patch (6.7K, 16-v28-0015-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 1df98a32c9b32458621c003f895a71c008c085d6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v28 15/15] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../modules/index/expected/killtuples.out     |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d9dc79f4a96..ccebc1f244b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 47d2479415e..ab2db931aac 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-18 08:55  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Kirill Reshke @ 2025-12-18 08:55 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Chao Li <[email protected]>

On Thu, 18 Dec 2025 at 05:30, Melanie Plageman
<[email protected]> wrote:
>
> > in v27-0005. This patch changes code which is not exercised in
> > tests[0]. I spent some time understanding the conditions when we
> > entered this. There is a comment about non-finished relation
> > extension, but I got no success trying to reproduce this. I ended up
> > modifying code to lose PageSetAllVisible in proper places and running
> > vacuum. Looks like everything works as expected. I will spend some
> > more time on this, maybe I will be successful in writing an
> > injection-point-based TAP test which hits this...
>
> Based on the coverage report link you provided, that code is changed
> by v27 0007, not 0005. 0005 is about moving an assertion out of
> lazy_scan_prune(). 0007 changes lazy_scan_new_or_empty() (the code in
> question).
>
> Regarding 0007, it looks like what is uncovered (the orange bits in
> the coverage report are uncovered, I assume) is empty pages _without_
> PD_ALL_VISIBLE set. I don't see anywhere where PageSetAllVisible() is
> called except vacuum and COPY FREEZE.

Sure, I meant 0007.

> If I was trying to guess how empty pages with PD_ALL_VISIBLE set are
> getting vacuumed, I would think it is due to SKIP_PAGES_THRESHOLD
> causing us to vacuum an all-frozen empty page.

Yes, vacuum (disable_page_skipping);

> Then the question is, why wouldn't we have coverage of the empty page
> first being set all-visible/all-frozen? It can't be COPY FREEZE
> because the page is empty. And it can't be vacuum, because then we
> would have coverage. It's very mysterious.
>
> It would be good to have coverage for this case. I don't think you'll
> need an injection point for the main case of "empty page not yet set
> all-visible is vacuumed for the first time" (unless I'm
> misunderstanding something).
>
> I'm not sure how you'll test the "vacuuming an empty, previously
> uninitialized page" case described in this comment, though.
>
>              * It's possible that another backend has extended the heap,
>              * initialized the page, and then failed to WAL-log the page due
>              * to an ERROR.  Since heap extension is not WAL-logged, recovery
>              * might try to replay our record setting the page all-visible and
>              * find that the page isn't initialized, which will cause a PANIC.
>              * To prevent that, check whether the page has been previously
>              * WAL-logged, and if not, do that now.
>
> You'd want to force an error during relation extension and then vacuum
> the page. I don't know if you need an injection point to force the
> error -- depends on what kind of error, I think.

I did small archeology and this "if (PageIsEmpty(page)) {   if
(!PageIsAllVisible(page)) { .... }}" code  originates back to
608195a3a365. Comment about not WAL-logged relation extension is from
a6370fd9ed3d, and I don't think we need to think about this case.

I am currently inclined to think that we cannot see an empty page that
has PD_ALL_VISIBLE not-set. This is because when we make a page empty,
we are in a critical section, and we WAL-log everything we do, so our
changes should not be half-made. Maybe as of 608195a3a365, there was a
case with empry-page-without-PD_ALL_VISIBLE, but I dont think this
happens on HEAD.

> So that I know for attribution, did you review 0003-0005?

yes, but I did not have any valuable review points for them.


Also, after the whole set is committed, we should then never
experience discrepancy between  PD_ALL_VISIBLE and VM bits? Because
they will be set in a single WAL record. The only cases when heap and
VM disagrees on all-visibility then are corruption,
pg_visibilitymap_truncate and old data (data before v19+ upgrade?)
If my understanding is correct, should we add document this?

-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-18 15:18  Melanie Plageman <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 2 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-12-18 15:18 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Chao Li <[email protected]>

On Thu, Dec 18, 2025 at 3:55 AM Kirill Reshke <[email protected]> wrote:
>
> On Thu, 18 Dec 2025 at 05:30, Melanie Plageman
> <[email protected]> wrote:
>
> > If I was trying to guess how empty pages with PD_ALL_VISIBLE set are
> > getting vacuumed, I would think it is due to SKIP_PAGES_THRESHOLD
> > causing us to vacuum an all-frozen empty page.
>
> Yes, vacuum (disable_page_skipping);

Ah, right, that would be a reliable way for it to happen.

> > Then the question is, why wouldn't we have coverage of the empty page
> > first being set all-visible/all-frozen? It can't be COPY FREEZE
> > because the page is empty. And it can't be vacuum, because then we
> > would have coverage. It's very mysterious.
<--snip-->
> I am currently inclined to think that we cannot see an empty page that
> has PD_ALL_VISIBLE not-set. This is because when we make a page empty,
> we are in a critical section, and we WAL-log everything we do, so our
> changes should not be half-made. Maybe as of 608195a3a365, there was a
> case with empry-page-without-PD_ALL_VISIBLE, but I dont think this
> happens on HEAD.

Right, so the way that empty pages get set PD_ALL_VISIBLE is when a
page has all its tuples deleted, the next time it is vacuumed it will
be set all-visible and all-frozen and have PD_ALL_VISIBLE set. (if
it's a trailing page it will be truncated, but any non-trailing page
will be like this).

But you are right, I don't see any non-error code path where a heap
page would become empty (all line pointers set unused) and then not be
set all-visible. Only vacuum sets line pointers unused and if all the
line pointers are unused it will always set the page all-visible.

I think, though, that if we error out in lazy_scan_prune() after
returning from heap_page_prune_and_freeze() such that we don't set the
empty page all-visible, we can end up with an empty page without
PD_ALL_VISIBLE set. You can see how this might work by patching the VM
set code in lazy_scan_prune() to skip empty pages.

> I did small archeology and this "if (PageIsEmpty(page)) {   if
> (!PageIsAllVisible(page)) { .... }}" code  originates back to
> 608195a3a365. Comment about not WAL-logged relation extension is from
> a6370fd9ed3d, and I don't think we need to think about this case.

Thanks for looking into this. Even if this code was added to handle
the error codepath I mentioned above, it seems like it would have been
good enough to just let lazy_scan_prune() handle setting the empty
page all-visible the next time the page was vacuumed. Since there is
no non-error code path where this can happen, it doesn't seem like it
would merit its own special case.

It is possible it was more common as of 608195a3a365, as you say.

I don't understand how the bug fixed by a6370fd9ed3d can happen. When
a new page is initialized, flags are set to 0, so regardless of WAL
logging of the extension not happening, how would the new page have
been set PD_ALL_VISIBLE?  We'll have to ask Andres or Robert about how
this was hit.

> Also, after the whole set is committed, we should then never
> experience discrepancy between  PD_ALL_VISIBLE and VM bits? Because
> they will be set in a single WAL record. The only cases when heap and
> VM disagrees on all-visibility then are corruption,
> pg_visibilitymap_truncate and old data (data before v19+ upgrade?)
> If my understanding is correct, should we add document this?

Even on current master, I don't see a scenario other than VM
corruption or truncation where PD_ALL_VISIBLE can be set but not the
VM (or vice versa). The only way would be if you error out after
setting PD_ALL_VISIBLE before setting the VM. Setting PD_ALL_VISIBLE
is not in a critical section in lazy_scan_prune(), so it won't panic
and dump shared memory, so the buffer with PD_ALL_VISIBLE set may
later get written out. But the only obvious way I see to error out of
MarkBufferDirty() is if the buffer is not valid -- which would have
kept us from doing previous operations on the buffer, I would think.

It's true this will no longer happen after my patches, as
PageSetAllVisible() will happen in a critical section. We could add a
comment about this particular scenario in the code somewhere. But I
don't think we should document it in any user-facing documentation
since you could still truncate the VM and have the two out of sync.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-18 15:45  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Kirill Reshke @ 2025-12-18 15:45 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Chao Li <[email protected]>

On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
<[email protected]> wrote:
> > Also, after the whole set is committed, we should then never
> > experience discrepancy between  PD_ALL_VISIBLE and VM bits? Because
> > they will be set in a single WAL record. The only cases when heap and
> > VM disagrees on all-visibility then are corruption,
> > pg_visibilitymap_truncate and old data (data before v19+ upgrade?)
> > If my understanding is correct, should we add document this?
>
> Even on current master, I don't see a scenario other than VM
> corruption or truncation where PD_ALL_VISIBLE can be set but not the
> VM (or vice versa). The only way would be if you error out after
> setting PD_ALL_VISIBLE before setting the VM. Setting PD_ALL_VISIBLE
> is not in a critical section in lazy_scan_prune(), so it won't panic
> and dump shared memory, so the buffer with PD_ALL_VISIBLE set may
> later get written out. But the only obvious way I see to error out of
> MarkBufferDirty() is if the buffer is not valid -- which would have
> kept us from doing previous operations on the buffer, I would think.
>

Well... I may be missing something, but on current HEAD,
XLOG_HEAP2_PRUNE_VACUUM_SCAN and XLOG_HEAP2_VISIBLE are two different
record, XLOG_HEAP2_PRUNE_VACUUM_SCAN being always emitted first. So,
WAL writer may end up kill-9-ed just after
XLOG_HEAP2_PRUNE_VACUUM_SCAN makes it to the disk, and
XLOG_HEAP2_VISIBLE never. Crash recovery then, and we have
discrepancy. This does not happen with a single WAL record.
Another simple reproducer here: standby streaming, receiving
XLOG_HEAP2_PRUNE_VACUUM_SCAN from primary, Then network becomes bad,
and we never get XLOG_HEAP2_VISIBLE from primary. Then we promoted by
the admin. And again, VM bit vs PD_ALL_VISIBLE discrepancy. Am I
missing something?


-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-18 18:07  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Kirill Reshke @ 2025-12-18 18:07 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Chao Li <[email protected]>

On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
<[email protected]> wrote:

> But you are right, I don't see any non-error code path where a heap
> page would become empty (all line pointers set unused) and then not be
> set all-visible. Only vacuum sets line pointers unused and if all the
> line pointers are unused it will always set the page all-visible.
>
> I think, though, that if we error out in lazy_scan_prune() after
> returning from heap_page_prune_and_freeze() such that we don't set the
> empty page all-visible, we can end up with an empty page without
> PD_ALL_VISIBLE set. You can see how this might work by patching the VM
> set code in lazy_scan_prune() to skip empty pages.
>

Thank you for your explanation!  I completely forgot that PD_ALL_VIS
is a non-persistent change (hint bit). so its update can be trivially
lost.
The simplest real-life example is being killed just after returning
from heap_page_prune_and_freeze, yes.
PFA tap test covering lazy_scan_new_or_empty code path for
empty-but-not-all-visible page

-- 
Best regards,
Kirill Reshke


Attachments:

  [application/octet-stream] v1-0001-Add-TAP-test-for-empty-page-vacuum.patch (4.3K, 2-v1-0001-Add-TAP-test-for-empty-page-vacuum.patch)
  download | inline diff:
From ac838953de9c4ab0cb5f13d1e1b8ad0a18e73e39 Mon Sep 17 00:00:00 2001
From: reshke <[email protected]>
Date: Thu, 18 Dec 2025 18:00:22 +0000
Subject: [PATCH v1] Add TAP test for empty page vacuum.

VACUUM can be run for empty pages with DISABLE_PAGE_SKIPPING option.
In this case, VACUUM wil set up page-level visibility bit
(PD_ALL_VISIBLE) if not previously set.
To end up with empty page which is missing visibility hint bit, we need
to forcefuly cancel (kill -9) backend, executing page freezing, jsut
after it did page pruning. Use injeciton point for this purpose and
add TAP test to cover "recovery" after error code path.
---
 src/backend/access/heap/vacuumlazy.c          |  6 ++
 .../test_misc/t/010_vacuum_empty_page.pl      | 75 +++++++++++++++++++
 2 files changed, 81 insertions(+)
 create mode 100644 src/test/modules/test_misc/t/010_vacuum_empty_page.pl

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 30778a15639..9b8cbb67f11 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -153,6 +153,7 @@
 #include "storage/freespace.h"
 #include "storage/lmgr.h"
 #include "storage/read_stream.h"
+#include "utils/injection_point.h"
 #include "utils/lsyscache.h"
 #include "utils/pg_rusage.h"
 #include "utils/timestamp.h"
@@ -1899,6 +1900,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			INJECTION_POINT("vacuum-empty-page-non-all-vis", NULL);
+
 			START_CRIT_SECTION();
 
 			/* mark buffer dirty before writing a WAL record */
@@ -2012,6 +2015,9 @@ lazy_scan_prune(LVRelState *vacrel,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
+	
+	INJECTION_POINT("vacuum-heap-prune-and-freeze-after", NULL);
+
 	Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
 	Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
 
diff --git a/src/test/modules/test_misc/t/010_vacuum_empty_page.pl b/src/test/modules/test_misc/t/010_vacuum_empty_page.pl
new file mode 100644
index 00000000000..af5c39d2435
--- /dev/null
+++ b/src/test/modules/test_misc/t/010_vacuum_empty_page.pl
@@ -0,0 +1,75 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Check how temporary file removals and statement queries are associated
+# in the server logs for various query sequences with the simple and
+# extended query protocols.
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+	plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize a new PostgreSQL test cluster
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init();
+$node->append_conf(
+	'postgresql.conf', qq(
+log_min_messages = 'notice'
+));
+$node->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$node->check_extension('injection_points'))
+{
+	plan skip_all => 'Extension injection_points not installed';
+}
+
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+
+# Setup table and populate with data
+$node->safe_psql(
+	"postgres", qq{
+CREATE TABLE vac_empty_test(a int);
+BEGIN;
+INSERT INTO vac_empty_test DEFAULT VALUES;
+ROLLBACK;
+});
+
+# From this point, autovacuum worker will wait at startup.
+$node->safe_psql('postgres',
+	"SELECT injection_points_attach('vacuum-heap-prune-and-freeze-after', 'error');");
+$node->safe_psql('postgres',
+	"SELECT injection_points_attach('vacuum-empty-page-non-all-vis', 'notice');");
+
+$node->psql('postgres', "VACUUM (FREEZE) vac_empty_test;", on_error_stop => 1);
+
+my $offset = -s $node->logfile;
+
+# Run vacuum, force it on empty page. 
+$node->safe_psql(
+	"postgres", qq{
+VACUUM (DISABLE_PAGE_SKIPPING) vac_empty_test;
+});
+
+ok( $node->log_contains(
+		qr/NOTICE:  notice triggered for injection point vacuum-empty-page-non-all-vis/,
+		$offset),
+	"vacuum sets all-visible page bit for empty page");
+
+
+$node->safe_psql('postgres',
+	"SELECT injection_points_detach('vacuum-heap-prune-and-freeze-after');");
+$node->safe_psql('postgres',
+	"SELECT injection_points_detach('vacuum-empty-page-non-all-vis');");
+
+$node->stop('fast');
+done_testing();
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-18 19:57  Melanie Plageman <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-12-18 19:57 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Chao Li <[email protected]>

On Thu, Dec 18, 2025 at 1:07 PM Kirill Reshke <[email protected]> wrote:
>
> On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
> <[email protected]> wrote:
>
> > But you are right, I don't see any non-error code path where a heap
> > page would become empty (all line pointers set unused) and then not be
> > set all-visible. Only vacuum sets line pointers unused and if all the
> > line pointers are unused it will always set the page all-visible.
> >
> > I think, though, that if we error out in lazy_scan_prune() after
> > returning from heap_page_prune_and_freeze() such that we don't set the
> > empty page all-visible, we can end up with an empty page without
> > PD_ALL_VISIBLE set. You can see how this might work by patching the VM
> > set code in lazy_scan_prune() to skip empty pages.
>
> Thank you for your explanation!  I completely forgot that PD_ALL_VIS
> is a non-persistent change (hint bit). so its update can be trivially
> lost.
> The simplest real-life example is being killed just after returning
> from heap_page_prune_and_freeze, yes.
> PFA tap test covering lazy_scan_new_or_empty code path for
> empty-but-not-all-visible page

Cool test! I'm going to have to think more about whether or not it is
worth adding a whole new TAP test for this codepath. Is there an
existing TAP test we could add it to so we don't need to make a new
cluster, etc? How long does the test take to run? Obviously it will be
quite short, but every bit we add to the test suite counts. I don't
actually know how much overhead there is with injection points.

I was chatting with Andres and he mentioned there is one other case
where you can end up in this code path (empty page without
PD_ALL_VISIBLE set) and this case does actually trigger this code:

            if (RelationNeedsWAL(vacrel->rel) &&
                !XLogRecPtrIsValid(PageGetLSN(page)))
                log_newpage_buffer(buf, true);

If you are inserting to a new page and you successfully call
PageInit() (making the page no longer considered new by PageIsNew()
because pd_upper will be set) but you error out before actually
inserting the tuple, then you will have an empty page without
PD_ALL_VISIBLE set. And assuming you error out before emitting WAL,
the page will not have a valid LSN set. So you will hit that code
which calls log_newpage_buffer().

I would say this case is so narrow (the log_newpage_buffer() codepath
in lazy_scan_new_or_empty()), it's not worth the added test overhead,
but I just wanted to share what I learned about when this code could
be hit.

Previously it was more common in the bulk extension case to have empty
pages not set PD_ALL_VISIBLE because bulk extension would call
PageInit() on all of the pages it extended so all the pages except the
target page were empty (today they are not initialized so they go into
the PageIsNew() branch).

So, in both cases, it seems like the empty page not set PD_ALL_VISIBLE
mostly only hit if we previously errored out.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-18 20:04  Melanie Plageman <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-12-18 20:04 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Chao Li <[email protected]>

On Thu, Dec 18, 2025 at 10:46 AM Kirill Reshke <[email protected]> wrote:
>
> On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
> <[email protected]> wrote:
> > > Also, after the whole set is committed, we should then never
> > > experience discrepancy between  PD_ALL_VISIBLE and VM bits? Because
> > > they will be set in a single WAL record. The only cases when heap and
> > > VM disagrees on all-visibility then are corruption,
> > > pg_visibilitymap_truncate and old data (data before v19+ upgrade?)
> > > If my understanding is correct, should we add document this?
> >
> > Even on current master, I don't see a scenario other than VM
> > corruption or truncation where PD_ALL_VISIBLE can be set but not the
> > VM (or vice versa). The only way would be if you error out after
> > setting PD_ALL_VISIBLE before setting the VM. Setting PD_ALL_VISIBLE
> > is not in a critical section in lazy_scan_prune(), so it won't panic
> > and dump shared memory, so the buffer with PD_ALL_VISIBLE set may
> > later get written out. But the only obvious way I see to error out of
> > MarkBufferDirty() is if the buffer is not valid -- which would have
> > kept us from doing previous operations on the buffer, I would think.
>
> Well... I may be missing something, but on current HEAD,
> XLOG_HEAP2_PRUNE_VACUUM_SCAN and XLOG_HEAP2_VISIBLE are two different
> record, XLOG_HEAP2_PRUNE_VACUUM_SCAN being always emitted first. So,
> WAL writer may end up kill-9-ed just after
> XLOG_HEAP2_PRUNE_VACUUM_SCAN makes it to the disk, and
> XLOG_HEAP2_VISIBLE never. Crash recovery then, and we have
> discrepancy. This does not happen with a single WAL record.
> Another simple reproducer here: standby streaming, receiving
> XLOG_HEAP2_PRUNE_VACUUM_SCAN from primary, Then network becomes bad,
> and we never get XLOG_HEAP2_VISIBLE from primary. Then we promoted by
> the admin. And again, VM bit vs PD_ALL_VISIBLE discrepancy. Am I
> missing something?

Well, currently XLOG_HEAP2_PRUNE_VACUUM_SCAN doesn't set
PD_ALL_VISIBLE. PD_ALL_VISIBLE is WAL-logged in the XLOG_HEAP2_VISIBLE
record because in lazy_scan_prune() we call PageSetAllVisible() and
then visibilitymap_set() -> log_heap_visible() adds the heap buffer to
the WAL chain (with XLogRegisterBuffer()).

And if you notice when XLOG_HEAP2_VISIBLE is replayed in
heap_xlog_visible(), that is where we do PageSetAllVisible() on the
heap page.

So I think you can end up with PD_ALL_VISIBLE set if you error out
precisely between setting it and WAL logging it because we don't set
it in a critical section. But you can't end up with a WAL record that
sets PD_ALL_VISIBLE and another one that sets the VM.

Once we have my code changes, you can never end up with PD_ALL_VISIBLE
set and the VM not set because they are in the same critical section
and if we error out, it will cause a panic which will purge shared
memory.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-18 20:31  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Kirill Reshke @ 2025-12-18 20:31 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Chao Li <[email protected]>

On Fri, 19 Dec 2025 at 00:58, Melanie Plageman
<[email protected]> wrote:
>
> On Thu, Dec 18, 2025 at 1:07 PM Kirill Reshke <[email protected]> wrote:
> >
> > On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
> > <[email protected]> wrote:
> >
> > > But you are right, I don't see any non-error code path where a heap
> > > page would become empty (all line pointers set unused) and then not be
> > > set all-visible. Only vacuum sets line pointers unused and if all the
> > > line pointers are unused it will always set the page all-visible.
> > >
> > > I think, though, that if we error out in lazy_scan_prune() after
> > > returning from heap_page_prune_and_freeze() such that we don't set the
> > > empty page all-visible, we can end up with an empty page without
> > > PD_ALL_VISIBLE set. You can see how this might work by patching the VM
> > > set code in lazy_scan_prune() to skip empty pages.
> >
> > Thank you for your explanation!  I completely forgot that PD_ALL_VIS
> > is a non-persistent change (hint bit). so its update can be trivially
> > lost.
> > The simplest real-life example is being killed just after returning
> > from heap_page_prune_and_freeze, yes.
> > PFA tap test covering lazy_scan_new_or_empty code path for
> > empty-but-not-all-visible page
>
> Cool test! I'm going to have to think more about whether or not it is
> worth adding a whole new TAP test for this codepath. Is there an
> existing TAP test we could add it to so we don't need to make a new
> cluster, etc? How long does the test take to run? Obviously it will be
> quite short, but every bit we add to the test suite counts. I don't
> actually know how much overhead there is with injection points.
>

Well, on my pc this test runs in ~1.5 sec. I did not find any other
TAP test to place this, so created a new.
Actually, I only check for specific patterns in the log file of the
cluster in this test, so this test can instead be a regression test.

```
reshke=# VACUUM (DISABLE_PAGE_SKIPPING) vac_empty_test;
NOTICE:  notice triggered for injection point vacuum-empty-page-non-all-vis
VACUUM
reshke=#
```
We will just check in the .out file that the code hits
'vacuum-empty-page-non-all-vis' after an error.
injection points overhead should not be that awful, just from my
experience. Maybe buildfarm members can say something here, I dunno.

Also, we already have a bunch of regression+inj point tests for some
rare cases, exempli gratia
src/test/modules/nbtree/sql/nbtree_half_dead_pages.sql.

-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-19 03:38  Xuneng Zhou <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Xuneng Zhou @ 2025-12-19 03:38 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Chao Li <[email protected]>

He Melanie,

Thanks for working on this.

On Wed, Dec 17, 2025 at 12:59 AM Melanie Plageman
<[email protected]> wrote:
>
> On Wed, Dec 3, 2025 at 6:07 PM Melanie Plageman
> <[email protected]> wrote:
> >
> > If we're just talking about the renaming, looking at procarray.c, it
> > is full of the word "removable" because its functions were largely
> > used to examine and determine if everyone can see an xmax as committed
> > and thus if that tuple is removable from their perspective. But
> > nothing about the code that I can see means it has to be an xmax. We
> > could just as well use the functions to determine if everyone can see
> > an xmin as committed.
>
> In the attached v27, I've removed the commit that renamed functions in
> procarray.c. I've added a single wrapper GlobalVisTestXidNotRunning()
> that is used in my code where I am testing live tuples. I think you'll
> find that I've addressed all of your review comments now -- as I've
> also gotten rid of the confusing blk_known_av logic through a series
> of refactors.
>
> The one outstanding point is which commits should bump
> XLOG_PAGE_MAGIC. (also review of the reworked patches).
>
> - Melanie

I’ve done a basic review of patches 1 and 2. Here are some comments
which may be somewhat immature, as this is a fairly large change set
and I’m new to some parts of the code.

1) Potential stale old_vmbits after VM repair n v2

// Corruption check 1
if (!PageIsAllVisible(page) &&
(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
visibilitymap_clear(...); // VM now cleared to 0
// but old_vmbits still holds ALL_VISIBLE
}

// ... later ...

if (!presult.all_visible)
return presult.ndeleted; // Not taken if presult.all_visible=true

new_vmbits = VISIBILITYMAP_ALL_VISIBLE; // Want to set this

if (old_vmbits == new_vmbits) // Stale old_vmbits=ALL_VISIBLE,
new_vmbits=ALL_VISIBLE
  return presult.ndeleted; // issue: early return

After corruption repair clears the VM, old_vmbits is stale. The early
return can fire unexpectedly, leaving the VM cleared when it should be
re-set. Should we reset old_vmbits = 0 after the visibilitymap_clear?

2) Add Assert(BufferIsDirty(buf))

Since the patch's core claim is "buffer must be dirty before WAL
registration", an assertion encodes this invariant. Should we add:

Assert(BufferIsValid(buf));
Assert(BufferIsDirty(buf));

right before the visibilitymap_set() call?

3) Comment about "only scenario"

The comment at lines:
> "The only scenario where it is not already dirty is if the VM was removed…"

This phrasing could become misleading after future refactors. Can we
make it more direct like:

> "We must mark the heap buffer dirty before calling visibilitymap_set(), because it may WAL-log the buffer and XLogRegisterBuffer() requires it."

4) Comment clarity

Current comment:

> "Even if PD_ALL_VISIBLE is already set, we don't need to worry about unnecessarily dirtying the heap buffer, as it must be marked dirty before adding it to the WAL chain. The only scenario where it is not already dirty is if the VM was removed..."

In this test we now call MarkBufferDirty() on the heap page even when
only setting the VM, so the comments claiming “does not need to modify
the heap buffer”/“no heap page modification” might be misleading. It
might be better to say the test doesn’t need to modify heap
tuples/page contents or doesn’t need to prune/freeze.

--
Best,
Xuneng





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-19 21:09  Melanie Plageman <[email protected]>
  parent: Xuneng Zhou <[email protected]>
  0 siblings, 2 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-12-19 21:09 UTC (permalink / raw)
  To: Xuneng Zhou <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Chao Li <[email protected]>

Attached v29 addresses some feedback and also corrects a small error
with the assertion I had added in the previous version's 0009.

On Thu, Dec 18, 2025 at 10:38 PM Xuneng Zhou <[email protected]> wrote:
>
> I’ve done a basic review of patches 1 and 2. Here are some comments
> which may be somewhat immature, as this is a fairly large change set
> and I’m new to some parts of the code.
>
> 1) Potential stale old_vmbits after VM repair n v2

Good catch! I've fixed this in attached v29.

> 2) Add Assert(BufferIsDirty(buf))
>
> Since the patch's core claim is "buffer must be dirty before WAL
> registration", an assertion encodes this invariant. Should we add:
>
> Assert(BufferIsValid(buf));
> Assert(BufferIsDirty(buf));
>
> right before the visibilitymap_set() call?

There are already assertions that will trip in various places -- most
importantly in XLogRegisterBuffer(), which is the one that inspired
this refactor.

> The comment at lines:
> > "The only scenario where it is not already dirty is if the VM was removed…"
>
> This phrasing could become misleading after future refactors. Can we
> make it more direct like:
>
> > "We must mark the heap buffer dirty before calling visibilitymap_set(), because it may WAL-log the buffer and XLogRegisterBuffer() requires it."

I see your point about future refactors missing updating comments like
this. But, I don't think we are going to refactor the code such that
we can have PD_ALL_VISIBLE set without the VM bits set more often.
Also, it is common practice in Postgres to describe very specific edge
cases or odd scenarios in order to explain code that may seem
confusing without the comment. It does risk that comment later
becoming stale, but it is better that future developers understand why
the code is there.

That being said, I take your point that the comment is confusing. I
have updated it in a different way.

> > "Even if PD_ALL_VISIBLE is already set, we don't need to worry about unnecessarily dirtying the heap buffer, as it must be marked dirty before adding it to the WAL chain. The only scenario where it is not already dirty is if the VM was removed..."
>
> In this test we now call MarkBufferDirty() on the heap page even when
> only setting the VM, so the comments claiming “does not need to modify
> the heap buffer”/“no heap page modification” might be misleading. It
> might be better to say the test doesn’t need to modify heap
> tuples/page contents or doesn’t need to prune/freeze.

The point I'm trying to make is that we have to dirty the buffer even
if we don't modify the page because of the XLOG sub-system
requirements. And, it may seem like a waste to do that if not
modifying the page, but the page will rarely be clean anyway. I've
tried to make this more clear in attached v29.

- Melanie


Attachments:

  [text/x-patch] v29-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch (10.2K, 2-v29-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch)
  download | inline diff:
From 8442278884c0d128547910d17d3b640e0a4078e4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v29 01/15] Combine visibilitymap_set() cases in
 lazy_scan_prune()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).

In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().

Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would assert out
anyway.

Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.

This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Srinath Reddy Sadipiralla <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Xuneng Zhou <[email protected]>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
Discussion: https://postgr.es/m/flat/CAAKRu_ZWx5gCbeCf7PWCv8p5%3D%3Db7EEws0VD2wksDxpXCvCyHvQ%40mail.gmail.com
---
 .../pg_visibility/expected/pg_visibility.out  | 44 ++++++++++
 contrib/pg_visibility/sql/pg_visibility.sql   | 20 +++++
 src/backend/access/heap/vacuumlazy.c          | 87 ++++---------------
 3 files changed, 82 insertions(+), 69 deletions(-)

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..e10f1706015 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -1,4 +1,5 @@
 CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
 --
 -- recently-dropped table
 --
@@ -204,6 +205,49 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (1,1)
+(1 row)
+
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map 
+----------------------------
+ 
+(1 row)
+
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (0,0)
+(1 row)
+
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+        FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+ ?column? 
+----------
+ t
+(1 row)
+
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (1,1)
+(1 row)
+
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 -- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..57af8a0c5b6 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -1,4 +1,5 @@
 CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
 
 --
 -- recently-dropped table
@@ -94,6 +95,25 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+        FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 30778a15639..cecba2146ea 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2093,16 +2093,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
 	 * all_frozen variables
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.all_visible && !all_visible_according_to_vm) ||
+		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
 		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
 			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
 
 		/*
 		 * It should never be the case that the visibility map page is set
@@ -2110,15 +2108,25 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+		 * unnecessarily dirtying the heap buffer. Nearly the only scenario
+		 * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
+		 * removed -- and that isn't worth optimizing for. And if we add the
+		 * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
+		 * it must be marked dirty.
 		 */
 		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
+
+		/*
+		 * If the page is being set all-frozen, we pass InvalidTransactionId
+		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+		 * make everything safe for REDO was logged when the page's tuples
+		 * were frozen.
+		 */
+		Assert(!presult.all_frozen ||
+			   !TransactionIdIsValid(presult.vm_conflict_horizon));
+
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
@@ -2190,65 +2198,6 @@ lazy_scan_prune(LVRelState *vacrel,
 							VISIBILITYMAP_VALID_BITS);
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v29-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch (14.0K, 3-v29-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch)
  download | inline diff:
From 80933bcb9f6a762a91ed773e36ea51e800105fac Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 11 Dec 2025 10:48:13 -0500
Subject: [PATCH v29 02/15] Eliminate use of cached VM value in
 lazy_scan_prune()

lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a365, when we did not pin the
VM page until confirming it was not all-visible. Now that the VM page is
already pinned, there is no meaningful benefit to relying on a cached VM
status.

Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available which would make the logic harder to reason about. Eliminating
it also enables us to detect and repair VM corruption on-access.

Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible that after
fixing corruption, the VM could be newly set, if pruning found the page
all-visible.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Xuneng Zhou <[email protected]>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 182 ++++++++++++---------------
 1 file changed, 83 insertions(+), 99 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cecba2146ea..d47ed7814c8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -248,13 +248,6 @@ typedef enum
  */
 #define EAGER_SCAN_REGION_SIZE 4096
 
-/*
- * heap_vac_scan_next_block() sets these flags to communicate information
- * about the block it read to the caller.
- */
-#define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
-#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
-
 typedef struct LVRelState
 {
 	/* Target heap relation and its indexes */
@@ -358,7 +351,6 @@ typedef struct LVRelState
 	/* State maintained by heap_vac_scan_next_block() */
 	BlockNumber current_block;	/* last block returned */
 	BlockNumber next_unskippable_block; /* next unskippable block */
-	bool		next_unskippable_allvis;	/* its visibility status */
 	bool		next_unskippable_eager_scanned; /* if it was eagerly scanned */
 	Buffer		next_unskippable_vmbuffer;	/* buffer containing its VM bit */
 
@@ -432,7 +424,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							Buffer vmbuffer, bool all_visible_according_to_vm,
+							Buffer vmbuffer,
 							bool *has_lpdead_items, bool *vm_page_frozen);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
@@ -1248,7 +1240,6 @@ lazy_scan_heap(LVRelState *vacrel)
 	/* Initialize for the first heap_vac_scan_next_block() call */
 	vacrel->current_block = InvalidBlockNumber;
 	vacrel->next_unskippable_block = InvalidBlockNumber;
-	vacrel->next_unskippable_allvis = false;
 	vacrel->next_unskippable_eager_scanned = false;
 	vacrel->next_unskippable_vmbuffer = InvalidBuffer;
 
@@ -1264,13 +1255,13 @@ lazy_scan_heap(LVRelState *vacrel)
 										MAIN_FORKNUM,
 										heap_vac_scan_next_block,
 										vacrel,
-										sizeof(uint8));
+										sizeof(bool));
 
 	while (true)
 	{
 		Buffer		buf;
 		Page		page;
-		uint8		blk_info = 0;
+		bool		was_eager_scanned = false;
 		int			ndeleted = 0;
 		bool		has_lpdead_items;
 		void	   *per_buffer_data = NULL;
@@ -1339,13 +1330,13 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (!BufferIsValid(buf))
 			break;
 
-		blk_info = *((uint8 *) per_buffer_data);
+		was_eager_scanned = *((bool *) per_buffer_data);
 		CheckBufferIsPinnedOnce(buf);
 		page = BufferGetPage(buf);
 		blkno = BufferGetBlockNumber(buf);
 
 		vacrel->scanned_pages++;
-		if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
+		if (was_eager_scanned)
 			vacrel->eager_scanned_pages++;
 
 		/* Report as block scanned, update error traceback information */
@@ -1416,7 +1407,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (got_cleanup_lock)
 			ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
 									   vmbuffer,
-									   blk_info & VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM,
 									   &has_lpdead_items, &vm_page_frozen);
 
 		/*
@@ -1433,8 +1423,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * exclude pages skipped due to cleanup lock contention from eager
 		 * freeze algorithm caps.
 		 */
-		if (got_cleanup_lock &&
-			(blk_info & VAC_BLK_WAS_EAGER_SCANNED))
+		if (got_cleanup_lock && was_eager_scanned)
 		{
 			/* Aggressive vacuums do not eager scan. */
 			Assert(!vacrel->aggressive);
@@ -1601,7 +1590,6 @@ heap_vac_scan_next_block(ReadStream *stream,
 {
 	BlockNumber next_block;
 	LVRelState *vacrel = callback_private_data;
-	uint8		blk_info = 0;
 
 	/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
 	next_block = vacrel->current_block + 1;
@@ -1664,8 +1652,8 @@ heap_vac_scan_next_block(ReadStream *stream,
 		 * otherwise they would've been unskippable.
 		 */
 		vacrel->current_block = next_block;
-		blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
-		*((uint8 *) per_buffer_data) = blk_info;
+		/* Block was not eager scanned */
+		*((bool *) per_buffer_data) = false;
 		return vacrel->current_block;
 	}
 	else
@@ -1677,11 +1665,7 @@ heap_vac_scan_next_block(ReadStream *stream,
 		Assert(next_block == vacrel->next_unskippable_block);
 
 		vacrel->current_block = next_block;
-		if (vacrel->next_unskippable_allvis)
-			blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
-		if (vacrel->next_unskippable_eager_scanned)
-			blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
-		*((uint8 *) per_buffer_data) = blk_info;
+		*((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
 		return vacrel->current_block;
 	}
 }
@@ -1706,7 +1690,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 	BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
 	Buffer		next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
 	bool		next_unskippable_eager_scanned = false;
-	bool		next_unskippable_allvis;
 
 	*skipsallvis = false;
 
@@ -1716,7 +1699,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 													   next_unskippable_block,
 													   &next_unskippable_vmbuffer);
 
-		next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
 
 		/*
 		 * At the start of each eager scan region, normal vacuums with eager
@@ -1735,7 +1717,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 		 * A block is unskippable if it is not all visible according to the
 		 * visibility map.
 		 */
-		if (!next_unskippable_allvis)
+		if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
 		{
 			Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
 			break;
@@ -1792,7 +1774,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 
 	/* write the local variables back to vacrel */
 	vacrel->next_unskippable_block = next_unskippable_block;
-	vacrel->next_unskippable_allvis = next_unskippable_allvis;
 	vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
 	vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
 }
@@ -1953,9 +1934,7 @@ cmpOffsetNumbers(const void *a, const void *b)
  * Caller must hold pin and buffer cleanup lock on the buffer.
  *
  * vmbuffer is the buffer containing the VM block with visibility information
- * for the heap block, blkno. all_visible_according_to_vm is the saved
- * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * for the heap block, blkno.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1972,7 +1951,6 @@ lazy_scan_prune(LVRelState *vacrel,
 				BlockNumber blkno,
 				Page page,
 				Buffer vmbuffer,
-				bool all_visible_according_to_vm,
 				bool *has_lpdead_items,
 				bool *vm_page_frozen)
 {
@@ -1986,6 +1964,8 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
+	uint8		old_vmbits = 0;
+	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2088,70 +2068,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 	Assert(!presult.all_frozen || presult.all_visible);
 
-	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
-	 */
-	if ((presult.all_visible && !all_visible_according_to_vm) ||
-		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
-		 * unnecessarily dirtying the heap buffer. Nearly the only scenario
-		 * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
-		 * removed -- and that isn't worth optimizing for. And if we add the
-		 * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
-		 * it must be marked dirty.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-
-		/*
-		 * If the page is being set all-frozen, we pass InvalidTransactionId
-		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
-		 * make everything safe for REDO was logged when the page's tuples
-		 * were frozen.
-		 */
-		Assert(!presult.all_frozen ||
-			   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
+	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
 
 	/*
 	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
@@ -2159,8 +2076,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
 	 * with buffer lock before concluding that the VM is corrupt.
 	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	if (!PageIsAllVisible(page) &&
+		(old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
 	{
 		ereport(WARNING,
 				(errcode(ERRCODE_DATA_CORRUPTED),
@@ -2169,6 +2086,8 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
 							VISIBILITYMAP_VALID_BITS);
+		/* VM bits are now clear */
+		old_vmbits = 0;
 	}
 
 	/*
@@ -2196,6 +2115,71 @@ lazy_scan_prune(LVRelState *vacrel,
 		MarkBufferDirty(buf);
 		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
 							VISIBILITYMAP_VALID_BITS);
+		/* VM bits are now clear */
+		old_vmbits = 0;
+	}
+
+	if (!presult.all_visible)
+		return presult.ndeleted;
+
+	/* Set the visibility map and page visibility hint */
+	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (presult.all_frozen)
+		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	/* Nothing to do */
+	if (old_vmbits == new_vmbits)
+		return presult.ndeleted;
+
+	Assert(presult.all_visible);
+
+	/*
+	 * It should never be the case that the visibility map page is set while
+	 * the page-level bit is clear, but the reverse is allowed (if checksums
+	 * are not enabled). Regardless, set both bits so that we get back in
+	 * sync.
+	 *
+	 * The heap buffer must be marked dirty before adding it to the WAL chain
+	 * when setting the VM. We don't worry about unnecessarily dirtying the
+	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
+	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
+	 * the VM bits clear, so there is no point in optimizing it.
+	 */
+	PageSetAllVisible(page);
+	MarkBufferDirty(buf);
+
+	/*
+	 * If the page is being set all-frozen, we pass InvalidTransactionId as
+	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
+	 * everything safe for REDO was logged when the page's tuples were frozen.
+	 */
+	Assert(!presult.all_frozen ||
+		   !TransactionIdIsValid(presult.vm_conflict_horizon));
+
+	visibilitymap_set(vacrel->rel, blkno, buf,
+					  InvalidXLogRecPtr,
+					  vmbuffer, presult.vm_conflict_horizon,
+					  new_vmbits);
+
+	/*
+	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
+	 * count it as newly set for logging.
+	 */
+	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+	{
+		vacrel->vm_new_visible_pages++;
+		if (presult.all_frozen)
+		{
+			vacrel->vm_new_visible_frozen_pages++;
+			*vm_page_frozen = true;
+		}
+	}
+	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 presult.all_frozen)
+	{
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
 	return presult.ndeleted;
-- 
2.43.0



  [text/x-patch] v29-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch (6.7K, 4-v29-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch)
  download | inline diff:
From 0527745e9d51b96520d741da9a9c099fcd82a9f9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v29 03/15] Refactor lazy_scan_prune() VM clear logic into
 helper

Encapsulating them in a helper makes the whole function clearer. There
is no functional change other than moving it into a helper.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 132 +++++++++++++++++----------
 1 file changed, 85 insertions(+), 47 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d47ed7814c8..2a027828891 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,6 +422,11 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+										   BlockNumber heap_blk, Page heap_page,
+										   int nlpdead_items,
+										   Buffer vmbuffer,
+										   uint8 vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1928,6 +1933,83 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+							   BlockNumber heap_blk, Page heap_page,
+							   int nlpdead_items,
+							   Buffer vmbuffer,
+							   uint8 vmbits)
+{
+	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (!PageIsAllVisible(heap_page) &&
+		((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2070,54 +2152,10 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(page) &&
-		(old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		/* VM bits are now clear */
+	if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+									   presult.lpdead_items, vmbuffer,
+									   old_vmbits))
 		old_vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		/* VM bits are now clear */
-		old_vmbits = 0;
-	}
 
 	if (!presult.all_visible)
 		return presult.ndeleted;
-- 
2.43.0



  [text/x-patch] v29-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch (26.6K, 5-v29-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch)
  download | inline diff:
From c8126c7046b81296d9cf3c81c8b6d6e5d9cf0951 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v29 04/15] Set the VM in heap_page_prune_and_freeze()

This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
---
 src/backend/access/heap/pruneheap.c  | 309 +++++++++++++++++++++++----
 src/backend/access/heap/vacuumlazy.c | 150 +------------
 src/include/access/heapam.h          |  21 ++
 3 files changed, 294 insertions(+), 186 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 07aa08cfe14..62404768bef 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+										   BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+										   Buffer vmbuffer,
+										   uint8 vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+								  Relation relation,
+								  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+								  Buffer vmbuffer,
+								  int nlpdead_items,
+								  uint8 *old_vmbits,
+								  uint8 *new_vmbits);
 
 
 /*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeParams params = {
 				.relation = relation,
 				.buffer = buffer,
+				.vmbuffer = InvalidBuffer,
 				.reason = PRUNE_ON_ACCESS,
 				.options = 0,
 				.vistest = vistest,
@@ -341,6 +356,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 
 	/*
@@ -396,51 +413,54 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
-	 * that after scanning the line pointers. We must correct all_visible and
-	 * all_frozen before we return them to the caller, so that the caller
-	 * doesn't set the VM bits incorrectly.
+	 * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+	 * them after deciding whether to freeze, but before updating the VM, to
+	 * avoid setting the VM bits incorrectly.
+	 *
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate->attempt_freeze)
 	{
 		prstate->all_visible = true;
 		prstate->all_frozen = true;
 	}
+	else if (prstate->attempt_update_vm)
+	{
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate->all_visible = false;
 		prstate->all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -775,10 +795,141 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+							   BlockNumber heap_blk, Page heap_page,
+							   int nlpdead_items,
+							   Buffer vmbuffer,
+							   uint8 vmbits)
+{
+	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (!PageIsAllVisible(heap_page) &&
+		((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+					  Relation relation,
+					  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+					  Buffer vmbuffer,
+					  int nlpdead_items,
+					  uint8 *old_vmbits,
+					  uint8 *new_vmbits)
+{
+	if (!prstate->attempt_update_vm)
+		return false;
+
+	*old_vmbits = visibilitymap_get_status(relation, heap_blk,
+										   &vmbuffer);
+
+	/* We do this even if not all-visible */
+	if (identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+									   nlpdead_items, vmbuffer,
+									   *old_vmbits))
+		*old_vmbits = 0;
+
+	if (!prstate->all_visible)
+		return false;
+
+	*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->all_frozen)
+		*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (*new_vmbits == *old_vmbits)
+	{
+		*new_vmbits = 0;
+		return false;
+	}
+
+	return true;
+}
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -793,12 +944,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -823,13 +975,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
+
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1011,6 +1169,65 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
 		}
 	}
+
+	/* Now update the visibility map and PD_ALL_VISIBLE hint */
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate,
+									  params->relation,
+									  blockno,
+									  buffer,
+									  page,
+									  vmbuffer,
+									  prstate.lpdead_items,
+									  &old_vmbits,
+									  &new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	/* Set the visibility map and page visibility hint, if relevant */
+	if (do_set_vm)
+	{
+		Assert(prstate.all_visible);
+
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed (if
+		 * checksums are not enabled). Regardless, set both bits so that we
+		 * get back in sync.
+		 *
+		 * The heap buffer must be marked dirty before adding it to the WAL
+		 * chain when setting the VM. We don't worry about unnecessarily
+		 * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
+		 * It is extremely rare to have a clean heap buffer with
+		 * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
+		 * point in optimizing it.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buffer);
+
+		/*
+		 * If the page is being set all-frozen, we pass InvalidTransactionId
+		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+		 * make everything safe for REDO was logged when the page's tuples
+		 * were frozen.
+		 */
+		Assert(!prstate.all_frozen ||
+			   !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+		visibilitymap_set(params->relation, blockno, buffer,
+						  InvalidXLogRecPtr,
+						  vmbuffer, presult->vm_conflict_horizon,
+						  new_vmbits);
+	}
+
+	/* Save the vmbits for caller */
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = new_vmbits;
 }
 
 
@@ -1485,6 +1702,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			{
 				TransactionId xmin;
 
+				Assert(prstate->attempt_update_vm);
+
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2a027828891..8b489349312 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,11 +422,7 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 vmbits);
+
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1933,83 +1929,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on an heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * Returns true if it cleared corruption and false otherwise.
- */
-static bool
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	return false;
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2041,13 +1960,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2147,75 +2065,25 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-									   presult.lpdead_items, vmbuffer,
-									   old_vmbits))
-		old_vmbits = 0;
-
-	if (!presult.all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
-	if (presult.all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	Assert(presult.all_visible);
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear, but the reverse is allowed (if checksums
-	 * are not enabled). Regardless, set both bits so that we get back in
-	 * sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
 
 	/*
 	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
 	 * count it as newly set for logging.
 	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
 		vacrel->vm_new_visible_pages++;
-		if (presult.all_frozen)
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
 			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.all_frozen)
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
 		vacrel->vm_new_frozen_pages++;
 		*vm_page_frozen = true;
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f7e4ae3843c..ad2af13ec39 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * If we will consider updating the visibility map, vmbuffer should
+	 * contain the correct block of the visibility map and be pinned.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
 	 *
 	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
 	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 *
+	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+	 * in the VM.
 	 */
 	int			options;
 
@@ -299,6 +309,17 @@ typedef struct PruneFreezeResult
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
+	 *
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
+	 * we have attempted to update the VM.
+	 */
+	uint8		new_vmbits;
+	uint8		old_vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v29-0005-Move-VM-assert-into-prune-freeze-code.patch (10.9K, 6-v29-0005-Move-VM-assert-into-prune-freeze-code.patch)
  download | inline diff:
From f801dd86f6b7b49b2d2aa747d1c42a11efbc53ab Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v29 05/15] Move VM assert into prune/freeze code

This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
---
 src/backend/access/heap/pruneheap.c  | 86 ++++++++++++++++++++++------
 src/backend/access/heap/vacuumlazy.c | 68 +---------------------
 src/include/access/heapam.h          | 25 +++-----
 3 files changed, 77 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 62404768bef..7f38d815de4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -925,6 +925,31 @@ heap_page_will_set_vm(PruneState *prstate,
 	return true;
 }
 
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+#endif
+
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -978,6 +1003,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -1136,23 +1162,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
@@ -1170,6 +1181,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		}
 	}
 
+	/*
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so we don't need to again.
+	 */
+	if (prstate.all_frozen)
+		vm_conflict_horizon = InvalidTransactionId;
+	else
+		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(presult->lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(params->relation, buffer,
+										prstate.cutoffs->OldestXmin,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == vm_conflict_horizon);
+	}
+#endif
+
 	/* Now update the visibility map and PD_ALL_VISIBLE hint */
 	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
@@ -1216,12 +1267,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * make everything safe for REDO was logged when the page's tuples
 		 * were frozen.
 		 */
-		Assert(!prstate.all_frozen ||
-			   !TransactionIdIsValid(presult->vm_conflict_horizon));
+		Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
 
 		visibilitymap_set(params->relation, blockno, buffer,
 						  InvalidXLogRecPtr,
-						  vmbuffer, presult->vm_conflict_horizon,
+						  vmbuffer, vm_conflict_horizon,
 						  new_vmbits);
 	}
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8b489349312..f56a02a3d46 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -457,20 +457,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
-										   OffsetNumber *deadoffsets,
-										   int ndeadoffsets,
-										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
-										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2006,32 +1992,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -3489,29 +3449,6 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
-{
-
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
-										  NULL, 0,
-										  all_frozen,
-										  visibility_cutoff_xid,
-										  logging_offnum);
-}
-#endif
 
 /*
  * Check whether the heap page in buf is all-visible except for the dead
@@ -3535,15 +3472,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
  * This logic is closely related to heap_prune_record_unchanged_lp_normal().
  * If you modify this function, ensure consistency with that code. An
  * assertion cross-checks that both remain in agreement. Do not introduce new
  * side-effects.
  */
-static bool
+bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ad2af13ec39..bec2f840102 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
 	 *
 	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
 	 * in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
 	int			live_tuples;
 	int			recently_dead_tuples;
 
-	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
-	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
-
 	/*
 	 * old_vmbits are the state of the all-visible and all-frozen bits in the
 	 * visibility map before updating it during phase I of vacuuming.
@@ -454,6 +438,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v29-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (14.4K, 7-v29-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 53b26c1cf6bf1e37fb5de4576aefee04a50a1f2f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v29 06/15] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c | 276 ++++++++++++++++------------
 1 file changed, 157 insertions(+), 119 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7f38d815de4..b66fc6c17e6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
 								  int nlpdead_items,
 								  uint8 *old_vmbits,
 								  uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  uint8 old_vmbits, uint8 new_vmbits,
+									  TransactionId latest_xid_removed,
+									  TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid);
 
 
 /*
@@ -795,6 +800,68 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 uint8 old_vmbits, uint8 new_vmbits,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid)
+{
+	TransactionId conflict_xid;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune &&
+		!do_freeze &&
+		do_set_vm &&
+		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+		(new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	return conflict_xid;
+}
+
 /*
  * Helper to correct any corruption detected on an heap page and its
  * corresponding visibility map page after pruning but before setting the
@@ -1003,7 +1070,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -1011,10 +1077,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
 	uint8		new_vmbits = 0;
 	uint8		old_vmbits = 0;
 
-
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
@@ -1075,6 +1141,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the VM bits based on information from the VM and
+	 * the all_visible/all_frozen flags.
+	 */
+	do_set_vm = heap_page_will_set_vm(&prstate,
+									  params->relation,
+									  blockno,
+									  buffer,
+									  page,
+									  vmbuffer,
+									  prstate.lpdead_items,
+									  &old_vmbits,
+									  &new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									old_vmbits, new_vmbits,
+									prstate.latest_xid_removed,
+									prstate.frz_conflict_horizon,
+									prstate.visibility_cutoff_xid);
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1096,14 +1193,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1117,6 +1217,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(page);
+			visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+									 params->relation->rd_locator);
+		}
+
 		MarkBufferDirty(buffer);
 
 		/*
@@ -1124,29 +1244,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(params->relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1156,43 +1259,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->hastup = prstate.hastup;
-
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
-	if (prstate.attempt_freeze)
-	{
-		if (presult->nfrozen > 0)
-		{
-			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
-		}
-		else
-		{
-			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
-		}
-	}
-
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so we don't need to again.
-	 */
-	if (prstate.all_frozen)
-		vm_conflict_horizon = InvalidTransactionId;
-	else
-		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 	/*
 	 * During its second pass over the heap, VACUUM calls
@@ -1207,7 +1275,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(presult->lpdead_items == 0);
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
 										prstate.cutoffs->OldestXmin,
@@ -1217,67 +1286,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		Assert(prstate.all_frozen == debug_all_frozen);
 
 		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == vm_conflict_horizon);
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
 #endif
 
-	/* Now update the visibility map and PD_ALL_VISIBLE hint */
-	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
-	do_set_vm = heap_page_will_set_vm(&prstate,
-									  params->relation,
-									  blockno,
-									  buffer,
-									  page,
-									  vmbuffer,
-									  prstate.lpdead_items,
-									  &old_vmbits,
-									  &new_vmbits);
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->hastup = prstate.hastup;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
-	/*
-	 * new_vmbits should be 0 regardless of whether or not the page is
-	 * all-visible if we do not intend to set the VM.
-	 */
-	Assert(do_set_vm || new_vmbits == 0);
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
 
-	/* Set the visibility map and page visibility hint, if relevant */
-	if (do_set_vm)
+	if (prstate.attempt_freeze)
 	{
-		Assert(prstate.all_visible);
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled). Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * The heap buffer must be marked dirty before adding it to the WAL
-		 * chain when setting the VM. We don't worry about unnecessarily
-		 * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
-		 * It is extremely rare to have a clean heap buffer with
-		 * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
-		 * point in optimizing it.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buffer);
-
-		/*
-		 * If the page is being set all-frozen, we pass InvalidTransactionId
-		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
-		 * make everything safe for REDO was logged when the page's tuples
-		 * were frozen.
-		 */
-		Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
-		visibilitymap_set(params->relation, blockno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, vm_conflict_horizon,
-						  new_vmbits);
+		if (presult->nfrozen > 0)
+		{
+			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+		}
+		else
+		{
+			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+		}
 	}
-
-	/* Save the vmbits for caller */
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = new_vmbits;
 }
 
 
-- 
2.43.0



  [text/x-patch] v29-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 8-v29-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 6f085fbf06d63c7427946cba50917c7fdf058fae Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v29 07/15] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f56a02a3d46..d22d2a86ed0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1867,9 +1867,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1886,13 +1889,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v29-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (24.7K, 9-v29-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From edb532823e2eebae3edf7c68e7dc5dfa8bd3f509 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v29 08/15] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 112 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 373 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6daf4a87dec..fb7a7548aa0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8813,50 +8813,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1823feff298..47d2479415e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b66fc6c17e6..538d06f8449 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1233,8 +1233,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * so there is no point in optimizing it.
 			 */
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
-									 params->relation->rd_locator);
+			visibilitymap_set(blockno, vmbuffer, new_vmbits,
+							  params->relation->rd_locator);
 		}
 
 		MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d22d2a86ed0..93f0f39c5f0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1889,11 +1889,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2771,9 +2771,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 * set PD_ALL_VISIBLE.
 		 */
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e15cb1825e..c0cac7ea1c3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..b27fcdfb345 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 04845d5e680..6505628120c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4331,7 +4331,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v29-0009-Simplify-heap_page_would_be_all_visible-visibili.patch (2.2K, 10-v29-0009-Simplify-heap_page_would_be_all_visible-visibili.patch)
  download | inline diff:
From eee8545874d8553cc74042fd6e3110cc38a71be4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 13:57:16 -0500
Subject: [PATCH v29 09/15] Simplify heap_page_would_be_all_visible visibility
 check

heap_page_would_be_all_visible() doesn't care about the distinction
between HEAPTUPLE_RECENTLY_DEAD and HEAPTUPLE_DEAD tuples -- any tuple
that is not HEAPTUPLE_LIVE means the page is not all-visible and causes
us to return false.

Therefore, we don't need to call HeapTupleSatisfiesVacuum(), which
includes an extra step to distinguish between dead and recently dead
tuples using OldestXmin. Replace it with the more minimal
HeapTupleSatisfiesVacuumHorizon().

This has the added benefit of making it easier to replace uses of
OldestXmin in heap_page_would_be_all_visible() in the future.
---
 src/backend/access/heap/vacuumlazy.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 93f0f39c5f0..ff297b0b025 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	{
 		ItemId		itemid;
 		HeapTupleData tuple;
+		TransactionId dead_after = InvalidTransactionId;
 
 		/*
 		 * Set the offset number so that we can display it along with any
@@ -3576,12 +3577,14 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 		/* Visibility checks may do IO or allocate memory */
 		Assert(CritSectionCount == 0);
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumHorizon(&tuple, buf, &dead_after))
 		{
 			case HEAPTUPLE_LIVE:
 				{
 					TransactionId xmin;
 
+					Assert(!TransactionIdIsValid(dead_after));
+
 					/* Check comments in lazy_scan_prune. */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
@@ -3614,8 +3617,10 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				}
 				break;
 
-			case HEAPTUPLE_DEAD:
 			case HEAPTUPLE_RECENTLY_DEAD:
+				Assert(TransactionIdIsValid(dead_after));
+				/* FALLTHROUGH */
+			case HEAPTUPLE_DEAD:
 			case HEAPTUPLE_INSERT_IN_PROGRESS:
 			case HEAPTUPLE_DELETE_IN_PROGRESS:
 				{
-- 
2.43.0



  [text/x-patch] v29-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.9K, 11-v29-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From ca02669803f9af01e4e7e3767a3e1a08d931bd0f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v29 10/15] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 53 ++++++++++-----------
 src/backend/access/heap/vacuumlazy.c        | 38 ++++++++++-----
 src/include/access/heapam.h                 |  4 +-
 4 files changed, 76 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index bf899c2d2c6..7d9bd28d8f0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1053,6 +1053,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 538d06f8449..54e60e2c635 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -461,11 +461,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -1001,14 +1002,14 @@ heap_page_will_set_vm(PruneState *prstate,
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -1095,6 +1096,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prune_freeze_plan(RelationGetRelid(params->relation),
 					  buffer, &prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1276,10 +1287,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
-										prstate.cutoffs->OldestXmin,
+										prstate.vistest,
 										&debug_all_frozen,
 										&debug_cutoff, off_loc));
 
@@ -1800,28 +1810,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still consider the newest xid on
+				 * the page to be running. If so, we don't consider the page
+				 * all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisTestIsRemovableXid
-				 * instead, if a non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ff297b0b025..94f8546be95 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2725,7 +2725,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3486,7 +3486,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3502,7 +3502,7 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3585,7 +3585,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 					Assert(!TransactionIdIsValid(dead_after));
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3594,16 +3594,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3634,6 +3635,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bec2f840102..1625b107575 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -439,7 +439,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -453,6 +453,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v29-0011-Unset-all_visible-sooner-if-not-freezing.patch (2.5K, 12-v29-0011-Unset-all_visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 097858a0e31bc307169aee8f54c1c693fb8cdc23 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v29 11/15] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 54e60e2c635..8df81833179 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1675,8 +1675,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	/*
 	 * Deliberately delay unsetting all_visible and all_frozen until later
 	 * during pruning. Removable dead tuples shouldn't preclude freezing the
-	 * page.
+	 * page. If we won't attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1936,8 +1941,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible and all_frozen until later, at the
 	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
 	 * freeze the page after pruning.  As long as we unset it before updating
-	 * the visibility map, this will be correct.
+	 * the visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v29-0012-Track-which-relations-are-modified-by-a-query.patch (2.5K, 13-v29-0012-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From ea4f0d5c34c303e3cf99f9e240e5c0a1db088cf0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v29 12/15] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
---
 src/backend/executor/execMain.c  | 4 ++++
 src/backend/executor/execUtils.c | 2 ++
 src/include/nodes/execnodes.h    | 6 ++++++
 3 files changed, 12 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 797d8b1ca1c..5b299ef81aa 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3968429f991..d8c385216e0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v29-0013-Pass-down-information-on-table-modification-to-s.patch (23.7K, 14-v29-0013-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 21c288f212887e37433155ca80093ab3893ea1f6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v29 13/15] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  7 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 +++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +++----
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++----
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      | 11 ++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              | 19 ++++++++++-------
 23 files changed, 93 insertions(+), 46 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 45d306037a4..5c4bf5f0c6e 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 88246071c4b..b63bd24ebfb 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..6c2e4e08b16 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index b7f10a1aed0..15f9cc11582 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
 	}
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index d7695dc1108..7bdbc7e5fa7 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 73ebc01a08f..a00bdfdf822 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index dae91630ac3..1957bb0f1a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b1a00ed477..130a670d266 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6377,7 +6377,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13766,7 +13766,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22623,7 +22623,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23087,7 +23087,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index be6ffd6ddb0..2921f68c1c3 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0b3a31f1703..74262a34819 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 860f79f9cc1..6e49ea5c5d8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 6bea42f128f..2c87ba5f767 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 72b135e5dcf..92674441c6d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 16b0adc172c..91acf1ee2d7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c760b19db55..ec0def0d1e2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 1625b107575..0bfe2366e1a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
 
 	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
 	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d10b1b03cdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the rel */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1164,9 +1167,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v29-0014-Allow-on-access-pruning-to-set-pages-all-visible.patch (11.0K, 15-v29-0014-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 7eab00259868aaa07ce3b80f1a01c379eb7f8905 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v29 14/15] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 15 ++++++-
 src/backend/access/heap/heapam_handler.c      | 15 ++++++-
 src/backend/access/heap/pruneheap.c           | 40 ++++++++++++++++++-
 src/include/access/heapam.h                   | 24 +++++++++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 5 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb7a7548aa0..d9dc79f4a96 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6c2e4e08b16..2cb98e58956 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2472,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2518,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8df81833179..3ddb1b396b4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
 								  Relation relation,
 								  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
 								  Buffer vmbuffer,
+								  PruneReason reason,
+								  bool do_prune, bool do_freeze,
 								  int nlpdead_items,
 								  uint8 *old_vmbits,
 								  uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 				.cutoffs = NULL,
 			};
 
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
 
@@ -951,6 +964,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
  * corrupted, it will fix them by clearing the VM bits and visibility hint.
  * This does not need to be done in a critical section.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with returning the
  * current value of the VM bits in *old_vmbits and the desired new value of
  * the VM bits in *new_vmbits.
@@ -960,6 +976,8 @@ heap_page_will_set_vm(PruneState *prstate,
 					  Relation relation,
 					  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
 					  Buffer vmbuffer,
+					  PruneReason reason,
+					  bool do_prune, bool do_freeze,
 					  int nlpdead_items,
 					  uint8 *old_vmbits,
 					  uint8 *new_vmbits)
@@ -967,6 +985,24 @@ heap_page_will_set_vm(PruneState *prstate,
 	if (!prstate->attempt_update_vm)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+		return false;
+	}
+
 	*old_vmbits = visibilitymap_get_status(relation, heap_blk,
 										   &vmbuffer);
 
@@ -1164,6 +1200,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  buffer,
 									  page,
 									  vmbuffer,
+									  params->reason,
+									  do_prune, do_freeze,
 									  prstate.lpdead_items,
 									  &old_vmbits,
 									  &new_vmbits);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0bfe2366e1a..3328f56c101 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 
 	/*
 	 * Some optimizations can only be performed if the query does not modify
@@ -420,7 +437,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v29-0015-Set-pd_prune_xid-on-insert.patch (6.7K, 16-v29-0015-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From c0e6bb1a30761705645110c426e7aa3759a18298 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v29 15/15] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../modules/index/expected/killtuples.out     |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d9dc79f4a96..ccebc1f244b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 47d2479415e..ab2db931aac 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-20 12:32  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Kirill Reshke @ 2025-12-20 12:32 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Xuneng Zhou <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Chao Li <[email protected]>

On Sat, 20 Dec 2025 at 02:10, Melanie Plageman
<[email protected]> wrote:
>
> Attached v29 addresses some feedback and also corrects a small error
> with the assertion I had added in the previous version's 0009.
>
> On Thu, Dec 18, 2025 at 10:38 PM Xuneng Zhou <[email protected]> wrote:
> >
> > I’ve done a basic review of patches 1 and 2. Here are some comments
> > which may be somewhat immature, as this is a fairly large change set
> > and I’m new to some parts of the code.
> >
> > 1) Potential stale old_vmbits after VM repair n v2
>
> Good catch! I've fixed this in attached v29.
>
> > 2) Add Assert(BufferIsDirty(buf))
> >
> > Since the patch's core claim is "buffer must be dirty before WAL
> > registration", an assertion encodes this invariant. Should we add:
> >
> > Assert(BufferIsValid(buf));
> > Assert(BufferIsDirty(buf));
> >
> > right before the visibilitymap_set() call?
>
> There are already assertions that will trip in various places -- most
> importantly in XLogRegisterBuffer(), which is the one that inspired
> this refactor.
>
> > The comment at lines:
> > > "The only scenario where it is not already dirty is if the VM was removed…"
> >
> > This phrasing could become misleading after future refactors. Can we
> > make it more direct like:
> >
> > > "We must mark the heap buffer dirty before calling visibilitymap_set(), because it may WAL-log the buffer and XLogRegisterBuffer() requires it."
>
> I see your point about future refactors missing updating comments like
> this. But, I don't think we are going to refactor the code such that
> we can have PD_ALL_VISIBLE set without the VM bits set more often.
> Also, it is common practice in Postgres to describe very specific edge
> cases or odd scenarios in order to explain code that may seem
> confusing without the comment. It does risk that comment later
> becoming stale, but it is better that future developers understand why
> the code is there.
>
> That being said, I take your point that the comment is confusing. I
> have updated it in a different way.
>
> > > "Even if PD_ALL_VISIBLE is already set, we don't need to worry about unnecessarily dirtying the heap buffer, as it must be marked dirty before adding it to the WAL chain. The only scenario where it is not already dirty is if the VM was removed..."
> >
> > In this test we now call MarkBufferDirty() on the heap page even when
> > only setting the VM, so the comments claiming “does not need to modify
> > the heap buffer”/“no heap page modification” might be misleading. It
> > might be better to say the test doesn’t need to modify heap
> > tuples/page contents or doesn’t need to prune/freeze.
>
> The point I'm trying to make is that we have to dirty the buffer even
> if we don't modify the page because of the XLOG sub-system
> requirements. And, it may seem like a waste to do that if not
> modifying the page, but the page will rarely be clean anyway. I've
> tried to make this more clear in attached v29.
>
> - Melanie


Hi! I checked v29-0009, about HeapTupleSatisfiesVacuumHorizon. Origins
of this code track down to fdf9e21196a6 which was committed as part of
[0], at which point
there was no HeapTupleSatisfiesVacuumHorizon function. I guess this is
the reason this optimization was not performed earlier.

I also think this patch is correct, because we do similar things for
HEAPTUPLE_DEAD & HEAPTUPLE_RECENTLY_DEAD, and
HeapTupleSatisfiesVacuumHorizon is just a proxy to
HeapTupleSatisfiesVacuumHorizon with only difference in DEAD VS
RECENTLY_DEAD handling.


Similar change could be done at heapam_scan_analyze_next_tuple

...
case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
/* Count dead and recently-dead rows */
*deadrows += 1;
break;
...



[0] https://www.postgresql.org/message-id/CABOikdP0meGuXPPWuYrP%3DvDvoqUdshF2xJAzZHWSKg03Rz_%2B9Q%40mail...


-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-22 07:19  Chao Li <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Chao Li @ 2025-12-22 07:19 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Xuneng Zhou <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>



> On Dec 20, 2025, at 05:09, Melanie Plageman <[email protected]> wrote:
> 
> Attached v29 addresses some feedback and also corrects a small error
> with the assertion I had added in the previous version's 0009.
> 
> On Thu, Dec 18, 2025 at 10:38 PM Xuneng Zhou <[email protected]> wrote:
>> 
>> I’ve done a basic review of patches 1 and 2. Here are some comments
>> which may be somewhat immature, as this is a fairly large change set
>> and I’m new to some parts of the code.
>> 
>> 1) Potential stale old_vmbits after VM repair n v2
> 
> Good catch! I've fixed this in attached v29.
> 
>> 2) Add Assert(BufferIsDirty(buf))
>> 
>> Since the patch's core claim is "buffer must be dirty before WAL
>> registration", an assertion encodes this invariant. Should we add:
>> 
>> Assert(BufferIsValid(buf));
>> Assert(BufferIsDirty(buf));
>> 
>> right before the visibilitymap_set() call?
> 
> There are already assertions that will trip in various places -- most
> importantly in XLogRegisterBuffer(), which is the one that inspired
> this refactor.
> 
>> The comment at lines:
>>> "The only scenario where it is not already dirty is if the VM was removed…"
>> 
>> This phrasing could become misleading after future refactors. Can we
>> make it more direct like:
>> 
>>> "We must mark the heap buffer dirty before calling visibilitymap_set(), because it may WAL-log the buffer and XLogRegisterBuffer() requires it."
> 
> I see your point about future refactors missing updating comments like
> this. But, I don't think we are going to refactor the code such that
> we can have PD_ALL_VISIBLE set without the VM bits set more often.
> Also, it is common practice in Postgres to describe very specific edge
> cases or odd scenarios in order to explain code that may seem
> confusing without the comment. It does risk that comment later
> becoming stale, but it is better that future developers understand why
> the code is there.
> 
> That being said, I take your point that the comment is confusing. I
> have updated it in a different way.
> 
>>> "Even if PD_ALL_VISIBLE is already set, we don't need to worry about unnecessarily dirtying the heap buffer, as it must be marked dirty before adding it to the WAL chain. The only scenario where it is not already dirty is if the VM was removed..."
>> 
>> In this test we now call MarkBufferDirty() on the heap page even when
>> only setting the VM, so the comments claiming “does not need to modify
>> the heap buffer”/“no heap page modification” might be misleading. It
>> might be better to say the test doesn’t need to modify heap
>> tuples/page contents or doesn’t need to prune/freeze.
> 
> The point I'm trying to make is that we have to dirty the buffer even
> if we don't modify the page because of the XLOG sub-system
> requirements. And, it may seem like a waste to do that if not
> modifying the page, but the page will rarely be clean anyway. I've
> tried to make this more clear in attached v29.
> 
> - Melanie
> <v29-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch><v29-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch><v29-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch><v29-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch><v29-0005-Move-VM-assert-into-prune-freeze-code.patch><v29-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v29-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v29-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v29-0009-Simplify-heap_page_would_be_all_visible-visibili.patch><v29-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v29-0011-Unset-all_visible-sooner-if-not-freezing.patch><v29-0012-Track-which-relations-are-modified-by-a-query.patch><v29-0013-Pass-down-information-on-table-modification-to-s.patch><v29-0014-Allow-on-access-pruning-to-set-pages-all-visible.patch><v29-0015-Set-pd_prune_xid-on-insert.patch>

A few more comments on v29:

1 - 0002 - Looks like since 0002, visibilitymap_set()’s return value is no longer used, so do we need to update the function and change return type to void? I remember in some patches, to address Coverity alerts, people had to do “(void) function_with_a_return_value()”.

2 - 0003
```
+ * Helper to correct any corruption detected on an heap page and its
```

Nit: “an” -> “a”

3 - 0003
```
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+							   BlockNumber heap_blk, Page heap_page,
+							   int nlpdead_items,
+							   Buffer vmbuffer,
+							   uint8 vmbits)
+{
+	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
```

Right before this function is called:
```
 	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
+	if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+									   presult.lpdead_items, vmbuffer,
+									   old_vmbits))
```

So, the Assert() is checking if old_vmbits is newly returned from visibilitymap_get_status(), in that case, identify_and_fix_vm_corruption() can take vmbits as a pointer , and it calls visibilitymap_get_status() to get vmbits itself and returns vmbits via the pointer, so that we don’t need to call visibilitymap_get_status() twice.

4 - 0004
```
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
+	 * we have attempted to update the VM.
+	 */
+	uint8		new_vmbits;
+	uint8		old_vmbits;
```

The comment feels a little confusing to me. "HEAP_PAGE_PRUNE_UPDATE_VM option is set” is a clear indication, but how to decide "we have attempted to update the VM”? By reading the code:
```
+	prstate->attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
```

It’s just the result of HEAP_PAGE_PRUNE_UPDATE_VM being set. So, maybe we don’t the “and” part.

5 - 0004
```
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+					  Relation relation,
+					  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+					  Buffer vmbuffer,
+					  int nlpdead_items,
+					  uint8 *old_vmbits,
+					  uint8 *new_vmbits)
+{
+	if (!prstate->attempt_update_vm)
+		return false;
```

old_vmbits and new_vmbits are purely output parameters. So, maybe we should set them to 0 inside this function instead of relying on callers to initialize them.

I think this is a similar case where I raised a comment earlier about initializing presult to {0} in the callers, and you only wanted to set presult in heap_page_prune_and_freeze().

6 - 0004
```
@@ -823,13 +975,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
+
 
 	/* Initialize prstate */
```

Nit: an extra empty line is added.

7 - 0005
```
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
```

Nit: a tailing dot is needed in the end of the comment line.

8 - 0005
```
@@ -978,6 +1003,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	TransactionId vm_conflict_horizon = InvalidTransactionId;
```

I guess the variable name “vm_conflict_horizon” comes from the old "presult->vm_conflict_horizon”. But in the new logic, this variable is used more generic, for example Assert(debug_cutoff == vm_conflict_horizon). I see 0006 has renamed to “conflict_xid”, so it’s up to you if or not rename it. But to make the commit self-contained, I’d suggest renaming it.

9 - 0006
```
@@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	{
 		ItemId		itemid;
 		HeapTupleData tuple;
+		TransactionId dead_after = InvalidTransactionId;
```

This initialization seems to not needed, as HeapTupleSatisfiesVacuumHorizon() will always set a value to it.

10 - 0010
```
+				 * there is any snapshot that still consider the newest xid on
```

Nit: consider -> considers

11 - 0011
```
+	 * page. If we won't attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
```

The comment says “just unset all-visible”, but the code actually also unset all_frozen.

12 - 0012
```
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
```

As we intentionally only want indexes, does it make sense to just name the field es_modified_rtindexes to make it more explicit.

13 - 0012
```
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
```

I think this comment is a little misleading, because SELECT FOR UPDATE/SHARE doesn’t always modify tuples of the relation. If a reader not associating this code with this patch, he may consider the comment is wrong. So, I think we should make the comment more explicit. Maybe rephrase like “If it has a rowmark, the relation may modify or lock heap pages”.

14 - 0015 - commit message
```
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
```

Typo: affetcting -> affecting

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-22 17:57  Melanie Plageman <[email protected]>
  parent: Chao Li <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2025-12-22 17:57 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Xuneng Zhou <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Dec 22, 2025 at 2:20 AM Chao Li <[email protected]> wrote:
>
> A few more comments on v29:

Thanks for the continued review! I've attached v30.

> 1 - 0002 - Looks like since 0002, visibilitymap_set()’s return value is no longer used, so do we need to update the function and change return type to void? I remember in some patches, to address Coverity alerts, people had to do “(void) function_with_a_return_value()”.

I was torn about whether or not to change the return value. Coverity
doesn't always warn about unused return values. Usually it warns if it
perceives the return value as needed for error checking or if it
thinks not using the return value is incorrect. It may still warn in
this case, but it's not obvious to me which way it would go.

I have changed the function signature as you suggested in v30.

My hesitation is that visibilitymap_set() is in a header file and
could be used by extensions/forks, etc. Adding more information by
changing a return value from void to non-void doesn't have any
negative effect on those potential callers. But taking away a return
value is more likely to affect them in a potentially negative way.

However, I'm significantly changing the signature in this release, so
everybody that used it will have to change their code completely
anyway. Also, I just added a return value for visibilitymap_set() in
the previous release (18). Historically, it returned void. So, I've
gone with your suggestion.

> +static bool
> +identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
> +                                                          BlockNumber heap_blk, Page heap_page,
> +                                                          int nlpdead_items,
> +                                                          Buffer vmbuffer,
> +                                                          uint8 vmbits)
> +{
> +       Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
> ```
>
> Right before this function is called:
> ```
>         old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
> +       if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
> +                                                                          presult.lpdead_items, vmbuffer,
> +                                                                          old_vmbits))
> ```
>
> So, the Assert() is checking if old_vmbits is newly returned from visibilitymap_get_status(), in that case, identify_and_fix_vm_corruption() can take vmbits as a pointer , and it calls visibilitymap_get_status() to get vmbits itself and returns vmbits via the pointer, so that we don’t need to call visibilitymap_get_status() twice.

I see what you are saying, and I did consider this.
visibilitymap_get_status() is only called the second time in assert
builds, and it isn't so expensive to do it that it is worth worrying
about.  I added the assertion to prevent other callers from calling
identify_and_fix_vm_corruption() with random VM bits unassociated with
the vmbuffer passed in.

The reason I don't think identify_and_fix_vm_corruption() should be
the one to call visibilitymap_get_status() and initialize old_vmbits
is that it shouldn't be a required step to setting the VM.
identify_and_fix_vm_corruption()'s job is to identify and fix
corruption -- not get the VM bits for when we set them. In fact, it
may make sense someday to check that the VM and PD_ALL_VISIBLE are in
sync before pruning and freezing is even started. (Of course, we can't
check the number of lpdead items until after).

Regarding having *old_vmbits as a return value. I thought about
directly returning the result of visibilitymap_clear() from
identify_and_fix_vm_corruption(). The reason I didn't is that if
PD_ALL_VISIBLE is set and nlpdead_items > 0 but the VM is clear,
visibilitymap_clear() will return false -- because it didn't need to
clear the VM bits. And I think we want
identify_and_fix_vm_corruption() to return true if it cleared
corruption at all.

I don't think we should have identify_and_fix_vm_corruption() reset
old_vmbits to 0 (and pass it by reference), because the caller may
want to know the value of old_vmbits before we cleared corruption.

> +        * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
> +        * we have attempted to update the VM.
> +        */
> +       uint8           new_vmbits;
> +       uint8           old_vmbits;
> ```
>
> The comment feels a little confusing to me. "HEAP_PAGE_PRUNE_UPDATE_VM option is set” is a clear indication, but how to decide "we have attempted to update the VM”? By reading the code:
> ```
> +       prstate->attempt_update_vm =
> +               (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
>
> It’s just the result of HEAP_PAGE_PRUNE_UPDATE_VM being set. So, maybe we don’t the “and” part.

Good point. Fixed.

> +static bool
> +heap_page_will_set_vm(PruneState *prstate,
> +                                         Relation relation,
> +                                         BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
> +                                         Buffer vmbuffer,
> +                                         int nlpdead_items,
> +                                         uint8 *old_vmbits,
> +                                         uint8 *new_vmbits)
> +{
> +       if (!prstate->attempt_update_vm)
> +               return false;
> ```
>
> old_vmbits and new_vmbits are purely output parameters. So, maybe we should set them to 0 inside this function instead of relying on callers to initialize them.
>
> I think this is a similar case where I raised a comment earlier about initializing presult to {0} in the callers, and you only wanted to set presult in heap_page_prune_and_freeze().

I see your point. It does feel a little bit different to me since they
are local variables and coverity may not actually be able to tell they
are being unconditionally initialized by heap_page_will_set_vm(). The
other local variables that are not initialized at the top are all
unconditionally set by helper return values. But my decision to
initialize them was more instinct than rationality. I've changed it as
you suggested.

> -        * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
> -        * will return 'all_visible', 'all_frozen' flags to the caller.
> +        * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
>
> Nit: a tailing dot is needed in the end of the comment line.

I've changed it. One interesting thing is that our "policy" for
periods in comments is that we don't put periods at the end of
one-line comments and we do put them at the end of mult-line comment
sentences. This is a one-line comment inside a comment block, so I
wasn't sure what to do. If you noticed it, and it bothered you, it's
easy enough to change, though.

> @@ -978,6 +1003,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>         Buffer          vmbuffer = params->vmbuffer;
>         Page            page = BufferGetPage(buffer);
>         BlockNumber blockno = BufferGetBlockNumber(buffer);
> +       TransactionId vm_conflict_horizon = InvalidTransactionId;
> ```
>
> I guess the variable name “vm_conflict_horizon” comes from the old "presult->vm_conflict_horizon”. But in the new logic, this variable is used more generic, for example Assert(debug_cutoff == vm_conflict_horizon). I see 0006 has renamed to “conflict_xid”, so it’s up to you if or not rename it. But to make the commit self-contained, I’d suggest renaming it.

As of this patch, it is still being exclusively used as the conflict
XID for setting the visibility map. And it still is the visibility
horizon. I rename it to conflict xid once it includes more than just
the visibility horizon for an all-visible page. In that assertion, it
is also the visibility horizon for an all-visible page.

> 9 - 0006
>
> @@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
>         {
>                 ItemId          itemid;
>                 HeapTupleData tuple;
> +               TransactionId dead_after = InvalidTransactionId;
> ```
>
> This initialization seems to not needed, as HeapTupleSatisfiesVacuumHorizon() will always set a value to it.

I think this is a comment for a later patch in the set (you originally
said it was from 0006), but I've changed dead_after to not be
initialized like this.

> +       /*
> +        * RT indexes of relations modified by the query either through
> +        * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
> +        */
> +       Bitmapset  *es_modified_relids;
> ```
>
> As we intentionally only want indexes, does it make sense to just name the field es_modified_rtindexes to make it more explicit.

I'm torn about this. I named it like this partially because the struct
member two above it in the estate, es_unpruned_relids, is also a
bitmapset of range table indexes and yet is called x_relids. Though
the bitmapset is one of indexes into the range table, they are the
indexes of relation IDs in that range table. I think this could go
either way, so I've left it as is for now and will think more about it
once this patch is closer to being committed.

> +                       /* If it has a rowmark, the relation is modified */
> +                       estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
> +                                                                                                               rc->rti);
> ```
>
> I think this comment is a little misleading, because SELECT FOR UPDATE/SHARE doesn’t always modify tuples of the relation. If a reader not associating this code with this patch, he may consider the comment is wrong. So, I think we should make the comment more explicit. Maybe rephrase like “If it has a rowmark, the relation may modify or lock heap pages”.

I see what you are saying. It's a good point. However, the reason we
don't want to set the VM for SELECT FOR UPDATE is not because the
SELECT FOR UPDATE will lock the relation but because it is usually
indicating that we intend to modify the relation (when we do the
update). As such, I've updated the comment to say "If it has a
rowmark, the relation may be modified" -- which leaves it more open.

- Melanie


Attachments:

  [text/x-patch] v30-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch (10.2K, 2-v30-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch)
  download | inline diff:
From ec1755a3055229d5bef9cc963f8f6b7edb2a1cd3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v30 01/16] Combine visibilitymap_set() cases in
 lazy_scan_prune()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).

In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().

Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would assert out
anyway.

Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.

This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Srinath Reddy Sadipiralla <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Xuneng Zhou <[email protected]>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
Discussion: https://postgr.es/m/flat/CAAKRu_ZWx5gCbeCf7PWCv8p5%3D%3Db7EEws0VD2wksDxpXCvCyHvQ%40mail.gmail.com
---
 .../pg_visibility/expected/pg_visibility.out  | 44 ++++++++++
 contrib/pg_visibility/sql/pg_visibility.sql   | 20 +++++
 src/backend/access/heap/vacuumlazy.c          | 87 ++++---------------
 3 files changed, 82 insertions(+), 69 deletions(-)

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..e10f1706015 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -1,4 +1,5 @@
 CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
 --
 -- recently-dropped table
 --
@@ -204,6 +205,49 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (1,1)
+(1 row)
+
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map 
+----------------------------
+ 
+(1 row)
+
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (0,0)
+(1 row)
+
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+        FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+ ?column? 
+----------
+ t
+(1 row)
+
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (1,1)
+(1 row)
+
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 -- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..57af8a0c5b6 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -1,4 +1,5 @@
 CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
 
 --
 -- recently-dropped table
@@ -94,6 +95,25 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+        FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 30778a15639..cecba2146ea 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2093,16 +2093,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
 	 * all_frozen variables
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.all_visible && !all_visible_according_to_vm) ||
+		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
 		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
 			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
 
 		/*
 		 * It should never be the case that the visibility map page is set
@@ -2110,15 +2108,25 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+		 * unnecessarily dirtying the heap buffer. Nearly the only scenario
+		 * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
+		 * removed -- and that isn't worth optimizing for. And if we add the
+		 * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
+		 * it must be marked dirty.
 		 */
 		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
+
+		/*
+		 * If the page is being set all-frozen, we pass InvalidTransactionId
+		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+		 * make everything safe for REDO was logged when the page's tuples
+		 * were frozen.
+		 */
+		Assert(!presult.all_frozen ||
+			   !TransactionIdIsValid(presult.vm_conflict_horizon));
+
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
@@ -2190,65 +2198,6 @@ lazy_scan_prune(LVRelState *vacrel,
 							VISIBILITYMAP_VALID_BITS);
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v30-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch (17.0K, 3-v30-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch)
  download | inline diff:
From f55678510379299ca66cf78fbf6e08ec8ecda0d2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 11 Dec 2025 10:48:13 -0500
Subject: [PATCH v30 02/16] Eliminate use of cached VM value in
 lazy_scan_prune()

lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a365, when we did not pin the
VM page until confirming it was not all-visible. Now that the VM page is
already pinned, there is no meaningful benefit to relying on a cached VM
status.

Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available which would make the logic harder to reason about. Eliminating
it also enables us to detect and repair VM corruption on-access.

Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible that after
fixing corruption, the VM could be newly set, if pruning found the page
all-visible.

Now that no callers of visibilitymap_set() use its return value, change
its (and visibilitymap_set_vmbits()) return type to void.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Xuneng Zhou <[email protected]>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
 src/backend/access/heap/vacuumlazy.c    | 182 +++++++++++-------------
 src/backend/access/heap/visibilitymap.c |   9 +-
 src/include/access/visibilitymap.h      |  18 +--
 3 files changed, 94 insertions(+), 115 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cecba2146ea..d47ed7814c8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -248,13 +248,6 @@ typedef enum
  */
 #define EAGER_SCAN_REGION_SIZE 4096
 
-/*
- * heap_vac_scan_next_block() sets these flags to communicate information
- * about the block it read to the caller.
- */
-#define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
-#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
-
 typedef struct LVRelState
 {
 	/* Target heap relation and its indexes */
@@ -358,7 +351,6 @@ typedef struct LVRelState
 	/* State maintained by heap_vac_scan_next_block() */
 	BlockNumber current_block;	/* last block returned */
 	BlockNumber next_unskippable_block; /* next unskippable block */
-	bool		next_unskippable_allvis;	/* its visibility status */
 	bool		next_unskippable_eager_scanned; /* if it was eagerly scanned */
 	Buffer		next_unskippable_vmbuffer;	/* buffer containing its VM bit */
 
@@ -432,7 +424,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							Buffer vmbuffer, bool all_visible_according_to_vm,
+							Buffer vmbuffer,
 							bool *has_lpdead_items, bool *vm_page_frozen);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
@@ -1248,7 +1240,6 @@ lazy_scan_heap(LVRelState *vacrel)
 	/* Initialize for the first heap_vac_scan_next_block() call */
 	vacrel->current_block = InvalidBlockNumber;
 	vacrel->next_unskippable_block = InvalidBlockNumber;
-	vacrel->next_unskippable_allvis = false;
 	vacrel->next_unskippable_eager_scanned = false;
 	vacrel->next_unskippable_vmbuffer = InvalidBuffer;
 
@@ -1264,13 +1255,13 @@ lazy_scan_heap(LVRelState *vacrel)
 										MAIN_FORKNUM,
 										heap_vac_scan_next_block,
 										vacrel,
-										sizeof(uint8));
+										sizeof(bool));
 
 	while (true)
 	{
 		Buffer		buf;
 		Page		page;
-		uint8		blk_info = 0;
+		bool		was_eager_scanned = false;
 		int			ndeleted = 0;
 		bool		has_lpdead_items;
 		void	   *per_buffer_data = NULL;
@@ -1339,13 +1330,13 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (!BufferIsValid(buf))
 			break;
 
-		blk_info = *((uint8 *) per_buffer_data);
+		was_eager_scanned = *((bool *) per_buffer_data);
 		CheckBufferIsPinnedOnce(buf);
 		page = BufferGetPage(buf);
 		blkno = BufferGetBlockNumber(buf);
 
 		vacrel->scanned_pages++;
-		if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
+		if (was_eager_scanned)
 			vacrel->eager_scanned_pages++;
 
 		/* Report as block scanned, update error traceback information */
@@ -1416,7 +1407,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (got_cleanup_lock)
 			ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
 									   vmbuffer,
-									   blk_info & VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM,
 									   &has_lpdead_items, &vm_page_frozen);
 
 		/*
@@ -1433,8 +1423,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * exclude pages skipped due to cleanup lock contention from eager
 		 * freeze algorithm caps.
 		 */
-		if (got_cleanup_lock &&
-			(blk_info & VAC_BLK_WAS_EAGER_SCANNED))
+		if (got_cleanup_lock && was_eager_scanned)
 		{
 			/* Aggressive vacuums do not eager scan. */
 			Assert(!vacrel->aggressive);
@@ -1601,7 +1590,6 @@ heap_vac_scan_next_block(ReadStream *stream,
 {
 	BlockNumber next_block;
 	LVRelState *vacrel = callback_private_data;
-	uint8		blk_info = 0;
 
 	/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
 	next_block = vacrel->current_block + 1;
@@ -1664,8 +1652,8 @@ heap_vac_scan_next_block(ReadStream *stream,
 		 * otherwise they would've been unskippable.
 		 */
 		vacrel->current_block = next_block;
-		blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
-		*((uint8 *) per_buffer_data) = blk_info;
+		/* Block was not eager scanned */
+		*((bool *) per_buffer_data) = false;
 		return vacrel->current_block;
 	}
 	else
@@ -1677,11 +1665,7 @@ heap_vac_scan_next_block(ReadStream *stream,
 		Assert(next_block == vacrel->next_unskippable_block);
 
 		vacrel->current_block = next_block;
-		if (vacrel->next_unskippable_allvis)
-			blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
-		if (vacrel->next_unskippable_eager_scanned)
-			blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
-		*((uint8 *) per_buffer_data) = blk_info;
+		*((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
 		return vacrel->current_block;
 	}
 }
@@ -1706,7 +1690,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 	BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
 	Buffer		next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
 	bool		next_unskippable_eager_scanned = false;
-	bool		next_unskippable_allvis;
 
 	*skipsallvis = false;
 
@@ -1716,7 +1699,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 													   next_unskippable_block,
 													   &next_unskippable_vmbuffer);
 
-		next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
 
 		/*
 		 * At the start of each eager scan region, normal vacuums with eager
@@ -1735,7 +1717,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 		 * A block is unskippable if it is not all visible according to the
 		 * visibility map.
 		 */
-		if (!next_unskippable_allvis)
+		if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
 		{
 			Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
 			break;
@@ -1792,7 +1774,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 
 	/* write the local variables back to vacrel */
 	vacrel->next_unskippable_block = next_unskippable_block;
-	vacrel->next_unskippable_allvis = next_unskippable_allvis;
 	vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
 	vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
 }
@@ -1953,9 +1934,7 @@ cmpOffsetNumbers(const void *a, const void *b)
  * Caller must hold pin and buffer cleanup lock on the buffer.
  *
  * vmbuffer is the buffer containing the VM block with visibility information
- * for the heap block, blkno. all_visible_according_to_vm is the saved
- * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * for the heap block, blkno.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1972,7 +1951,6 @@ lazy_scan_prune(LVRelState *vacrel,
 				BlockNumber blkno,
 				Page page,
 				Buffer vmbuffer,
-				bool all_visible_according_to_vm,
 				bool *has_lpdead_items,
 				bool *vm_page_frozen)
 {
@@ -1986,6 +1964,8 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
+	uint8		old_vmbits = 0;
+	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2088,70 +2068,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 	Assert(!presult.all_frozen || presult.all_visible);
 
-	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
-	 */
-	if ((presult.all_visible && !all_visible_according_to_vm) ||
-		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
-		 * unnecessarily dirtying the heap buffer. Nearly the only scenario
-		 * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
-		 * removed -- and that isn't worth optimizing for. And if we add the
-		 * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
-		 * it must be marked dirty.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-
-		/*
-		 * If the page is being set all-frozen, we pass InvalidTransactionId
-		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
-		 * make everything safe for REDO was logged when the page's tuples
-		 * were frozen.
-		 */
-		Assert(!presult.all_frozen ||
-			   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
+	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
 
 	/*
 	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
@@ -2159,8 +2076,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
 	 * with buffer lock before concluding that the VM is corrupt.
 	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	if (!PageIsAllVisible(page) &&
+		(old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
 	{
 		ereport(WARNING,
 				(errcode(ERRCODE_DATA_CORRUPTED),
@@ -2169,6 +2086,8 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
 							VISIBILITYMAP_VALID_BITS);
+		/* VM bits are now clear */
+		old_vmbits = 0;
 	}
 
 	/*
@@ -2196,6 +2115,71 @@ lazy_scan_prune(LVRelState *vacrel,
 		MarkBufferDirty(buf);
 		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
 							VISIBILITYMAP_VALID_BITS);
+		/* VM bits are now clear */
+		old_vmbits = 0;
+	}
+
+	if (!presult.all_visible)
+		return presult.ndeleted;
+
+	/* Set the visibility map and page visibility hint */
+	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (presult.all_frozen)
+		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	/* Nothing to do */
+	if (old_vmbits == new_vmbits)
+		return presult.ndeleted;
+
+	Assert(presult.all_visible);
+
+	/*
+	 * It should never be the case that the visibility map page is set while
+	 * the page-level bit is clear, but the reverse is allowed (if checksums
+	 * are not enabled). Regardless, set both bits so that we get back in
+	 * sync.
+	 *
+	 * The heap buffer must be marked dirty before adding it to the WAL chain
+	 * when setting the VM. We don't worry about unnecessarily dirtying the
+	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
+	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
+	 * the VM bits clear, so there is no point in optimizing it.
+	 */
+	PageSetAllVisible(page);
+	MarkBufferDirty(buf);
+
+	/*
+	 * If the page is being set all-frozen, we pass InvalidTransactionId as
+	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
+	 * everything safe for REDO was logged when the page's tuples were frozen.
+	 */
+	Assert(!presult.all_frozen ||
+		   !TransactionIdIsValid(presult.vm_conflict_horizon));
+
+	visibilitymap_set(vacrel->rel, blkno, buf,
+					  InvalidXLogRecPtr,
+					  vmbuffer, presult.vm_conflict_horizon,
+					  new_vmbits);
+
+	/*
+	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
+	 * count it as newly set for logging.
+	 */
+	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+	{
+		vacrel->vm_new_visible_pages++;
+		if (presult.all_frozen)
+		{
+			vacrel->vm_new_visible_frozen_pages++;
+			*vm_page_frozen = true;
+		}
+	}
+	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 presult.all_frozen)
+	{
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
 	return presult.ndeleted;
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..cdcb475e501 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -240,10 +240,8 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
  * You must pass a buffer containing the correct map page to this function.
  * Call visibilitymap_pin first to pin the right one. This function doesn't do
  * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
  */
-uint8
+void
 visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
 				  uint8 flags)
@@ -320,7 +318,6 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	}
 
 	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
 }
 
 /*
@@ -343,7 +340,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  *
  * rlocator is used only for debugging messages.
  */
-uint8
+void
 visibilitymap_set_vmbits(BlockNumber heapBlk,
 						 Buffer vmBuf, uint8 flags,
 						 const RelFileLocator rlocator)
@@ -386,8 +383,6 @@ visibilitymap_set_vmbits(BlockNumber heapBlk,
 		map[mapByte] |= (flags << mapOffset);
 		MarkBufferDirty(vmBuf);
 	}
-
-	return status;
 }
 
 /*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..787c19e5fef 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -32,15 +32,15 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const RelFileLocator rlocator);
+extern void visibilitymap_set(Relation rel,
+							  BlockNumber heapBlk, Buffer heapBuf,
+							  XLogRecPtr recptr,
+							  Buffer vmBuf,
+							  TransactionId cutoff_xid,
+							  uint8 flags);
+extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
+									 Buffer vmBuf, uint8 flags,
+									 const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v30-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch (6.7K, 4-v30-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch)
  download | inline diff:
From d196bdeefae2b14ca3b7abf22b6d6cffca116cd4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v30 03/16] Refactor lazy_scan_prune() VM clear logic into
 helper

Encapsulating them in a helper makes the whole function clearer. There
is no functional change other than moving it into a helper.

Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 132 +++++++++++++++++----------
 1 file changed, 85 insertions(+), 47 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d47ed7814c8..c5fc5b71f94 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,6 +422,11 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+										   BlockNumber heap_blk, Page heap_page,
+										   int nlpdead_items,
+										   Buffer vmbuffer,
+										   uint8 vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1928,6 +1933,83 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
+/*
+ * Helper to correct any corruption detected on a heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+							   BlockNumber heap_blk, Page heap_page,
+							   int nlpdead_items,
+							   Buffer vmbuffer,
+							   uint8 vmbits)
+{
+	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (!PageIsAllVisible(heap_page) &&
+		((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2070,54 +2152,10 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(page) &&
-		(old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		/* VM bits are now clear */
+	if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+									   presult.lpdead_items, vmbuffer,
+									   old_vmbits))
 		old_vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		/* VM bits are now clear */
-		old_vmbits = 0;
-	}
 
 	if (!presult.all_visible)
 		return presult.ndeleted;
-- 
2.43.0



  [text/x-patch] v30-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch (26.8K, 5-v30-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch)
  download | inline diff:
From 566794eed6786868a1147e6a0436d74c0603ccdf Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v30 04/16] Set the VM in heap_page_prune_and_freeze()

This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 315 +++++++++++++++++++++++----
 src/backend/access/heap/vacuumlazy.c | 150 +------------
 src/include/access/heapam.h          |  20 ++
 3 files changed, 299 insertions(+), 186 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 07aa08cfe14..1c1446058a7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+										   BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+										   Buffer vmbuffer,
+										   uint8 vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+								  Relation relation,
+								  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+								  Buffer vmbuffer,
+								  int nlpdead_items,
+								  uint8 *old_vmbits,
+								  uint8 *new_vmbits);
 
 
 /*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeParams params = {
 				.relation = relation,
 				.buffer = buffer,
+				.vmbuffer = InvalidBuffer,
 				.reason = PRUNE_ON_ACCESS,
 				.options = 0,
 				.vistest = vistest,
@@ -341,6 +356,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 
 	/*
@@ -396,51 +413,54 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
-	 * that after scanning the line pointers. We must correct all_visible and
-	 * all_frozen before we return them to the caller, so that the caller
-	 * doesn't set the VM bits incorrectly.
+	 * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+	 * them after deciding whether to freeze, but before updating the VM, to
+	 * avoid setting the VM bits incorrectly.
+	 *
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate->attempt_freeze)
 	{
 		prstate->all_visible = true;
 		prstate->all_frozen = true;
 	}
+	else if (prstate->attempt_update_vm)
+	{
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate->all_visible = false;
 		prstate->all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -775,10 +795,148 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Helper to correct any corruption detected on a heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+							   BlockNumber heap_blk, Page heap_page,
+							   int nlpdead_items,
+							   Buffer vmbuffer,
+							   uint8 vmbits)
+{
+	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (!PageIsAllVisible(heap_page) &&
+		((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ *
+ * If the VM should not be set, it returns false. If we won't consider
+ * updating the VM, *old_vmbits will be 0, regardless of the current value of
+ * the VM bits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+					  Relation relation,
+					  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+					  Buffer vmbuffer,
+					  int nlpdead_items,
+					  uint8 *old_vmbits,
+					  uint8 *new_vmbits)
+{
+	*old_vmbits = 0;
+	*new_vmbits = 0;
+
+	if (!prstate->attempt_update_vm)
+		return false;
+
+	*old_vmbits = visibilitymap_get_status(relation, heap_blk,
+										   &vmbuffer);
+
+	/* We do this even if not all-visible */
+	if (identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+									   nlpdead_items, vmbuffer,
+									   *old_vmbits))
+		*old_vmbits = 0;
+
+	if (!prstate->all_visible)
+		return false;
+
+	*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->all_frozen)
+		*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (*new_vmbits == *old_vmbits)
+	{
+		*new_vmbits = 0;
+		return false;
+	}
+
+	return true;
+}
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -793,12 +951,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -823,13 +982,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	uint8		new_vmbits;
+	uint8		old_vmbits;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1011,6 +1175,65 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
 		}
 	}
+
+	/* Now update the visibility map and PD_ALL_VISIBLE hint */
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate,
+									  params->relation,
+									  blockno,
+									  buffer,
+									  page,
+									  vmbuffer,
+									  prstate.lpdead_items,
+									  &old_vmbits,
+									  &new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	/* Set the visibility map and page visibility hint, if relevant */
+	if (do_set_vm)
+	{
+		Assert(prstate.all_visible);
+
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed (if
+		 * checksums are not enabled). Regardless, set both bits so that we
+		 * get back in sync.
+		 *
+		 * The heap buffer must be marked dirty before adding it to the WAL
+		 * chain when setting the VM. We don't worry about unnecessarily
+		 * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
+		 * It is extremely rare to have a clean heap buffer with
+		 * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
+		 * point in optimizing it.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buffer);
+
+		/*
+		 * If the page is being set all-frozen, we pass InvalidTransactionId
+		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+		 * make everything safe for REDO was logged when the page's tuples
+		 * were frozen.
+		 */
+		Assert(!prstate.all_frozen ||
+			   !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+		visibilitymap_set(params->relation, blockno, buffer,
+						  InvalidXLogRecPtr,
+						  vmbuffer, presult->vm_conflict_horizon,
+						  new_vmbits);
+	}
+
+	/* Save the vmbits for caller */
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = new_vmbits;
 }
 
 
@@ -1485,6 +1708,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			{
 				TransactionId xmin;
 
+				Assert(prstate->attempt_update_vm);
+
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c5fc5b71f94..8b489349312 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,11 +422,7 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 vmbits);
+
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1933,83 +1929,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * Returns true if it cleared corruption and false otherwise.
- */
-static bool
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	return false;
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2041,13 +1960,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2147,75 +2065,25 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-									   presult.lpdead_items, vmbuffer,
-									   old_vmbits))
-		old_vmbits = 0;
-
-	if (!presult.all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
-	if (presult.all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	Assert(presult.all_visible);
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear, but the reverse is allowed (if checksums
-	 * are not enabled). Regardless, set both bits so that we get back in
-	 * sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
 
 	/*
 	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
 	 * count it as newly set for logging.
 	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
 		vacrel->vm_new_visible_pages++;
-		if (presult.all_frozen)
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
 			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.all_frozen)
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
 		vacrel->vm_new_frozen_pages++;
 		*vm_page_frozen = true;
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f7e4ae3843c..0913759219c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * If we will consider updating the visibility map, vmbuffer should
+	 * contain the correct block of the visibility map and be pinned.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
 	 *
 	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
 	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 *
+	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+	 * in the VM.
 	 */
 	int			options;
 
@@ -299,6 +309,16 @@ typedef struct PruneFreezeResult
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
+	 *
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
+	 */
+	uint8		new_vmbits;
+	uint8		old_vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v30-0005-Move-VM-assert-into-prune-freeze-code.patch (10.9K, 6-v30-0005-Move-VM-assert-into-prune-freeze-code.patch)
  download | inline diff:
From 9f5072500e2a3bc2f2a8490f1ca11bf60a81515a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v30 05/16] Move VM assert into prune/freeze code

This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 86 ++++++++++++++++++++++------
 src/backend/access/heap/vacuumlazy.c | 68 +---------------------
 src/include/access/heapam.h          | 25 +++-----
 3 files changed, 77 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1c1446058a7..7af6aea2d0e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -932,6 +932,31 @@ heap_page_will_set_vm(PruneState *prstate,
 	return true;
 }
 
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+#endif
+
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -985,6 +1010,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -1142,23 +1168,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
@@ -1176,6 +1187,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		}
 	}
 
+	/*
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so we don't need to again.
+	 */
+	if (prstate.all_frozen)
+		vm_conflict_horizon = InvalidTransactionId;
+	else
+		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(presult->lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(params->relation, buffer,
+										prstate.cutoffs->OldestXmin,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == vm_conflict_horizon);
+	}
+#endif
+
 	/* Now update the visibility map and PD_ALL_VISIBLE hint */
 	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
@@ -1222,12 +1273,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * make everything safe for REDO was logged when the page's tuples
 		 * were frozen.
 		 */
-		Assert(!prstate.all_frozen ||
-			   !TransactionIdIsValid(presult->vm_conflict_horizon));
+		Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
 
 		visibilitymap_set(params->relation, blockno, buffer,
 						  InvalidXLogRecPtr,
-						  vmbuffer, presult->vm_conflict_horizon,
+						  vmbuffer, vm_conflict_horizon,
 						  new_vmbits);
 	}
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8b489349312..f56a02a3d46 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -457,20 +457,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
-										   OffsetNumber *deadoffsets,
-										   int ndeadoffsets,
-										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
-										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2006,32 +1992,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -3489,29 +3449,6 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
-{
-
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
-										  NULL, 0,
-										  all_frozen,
-										  visibility_cutoff_xid,
-										  logging_offnum);
-}
-#endif
 
 /*
  * Check whether the heap page in buf is all-visible except for the dead
@@ -3535,15 +3472,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
  * This logic is closely related to heap_prune_record_unchanged_lp_normal().
  * If you modify this function, ensure consistency with that code. An
  * assertion cross-checks that both remain in agreement. Do not introduce new
  * side-effects.
  */
-static bool
+bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0913759219c..88e79c58a10 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 *
 	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
 	 * in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
 	int			live_tuples;
 	int			recently_dead_tuples;
 
-	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
-	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
-
 	/*
 	 * old_vmbits are the state of the all-visible and all-frozen bits in the
 	 * visibility map before updating it during phase I of vacuuming.
@@ -453,6 +437,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v30-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (14.2K, 7-v30-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From eb94a7df040b6250d3ea3e0d1a79f24a3dc4fd6a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v30 06/16] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c | 275 ++++++++++++++++------------
 1 file changed, 157 insertions(+), 118 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7af6aea2d0e..49d3ebb0063 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
 								  int nlpdead_items,
 								  uint8 *old_vmbits,
 								  uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  uint8 old_vmbits, uint8 new_vmbits,
+									  TransactionId latest_xid_removed,
+									  TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid);
 
 
 /*
@@ -795,6 +800,68 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 uint8 old_vmbits, uint8 new_vmbits,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid)
+{
+	TransactionId conflict_xid;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune &&
+		!do_freeze &&
+		do_set_vm &&
+		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+		(new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	return conflict_xid;
+}
+
 /*
  * Helper to correct any corruption detected on a heap page and its
  * corresponding visibility map page after pruning but before setting the
@@ -1010,7 +1077,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -1018,6 +1084,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid;
 	uint8		new_vmbits;
 	uint8		old_vmbits;
 
@@ -1081,6 +1148,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the VM bits based on information from the VM and
+	 * the all_visible/all_frozen flags.
+	 */
+	do_set_vm = heap_page_will_set_vm(&prstate,
+									  params->relation,
+									  blockno,
+									  buffer,
+									  page,
+									  vmbuffer,
+									  prstate.lpdead_items,
+									  &old_vmbits,
+									  &new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									old_vmbits, new_vmbits,
+									prstate.latest_xid_removed,
+									prstate.frz_conflict_horizon,
+									prstate.visibility_cutoff_xid);
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1102,14 +1200,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1123,6 +1224,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(page);
+			visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+									 params->relation->rd_locator);
+		}
+
 		MarkBufferDirty(buffer);
 
 		/*
@@ -1130,29 +1251,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(params->relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1162,43 +1266,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->hastup = prstate.hastup;
-
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
-	if (prstate.attempt_freeze)
-	{
-		if (presult->nfrozen > 0)
-		{
-			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
-		}
-		else
-		{
-			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
-		}
-	}
-
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so we don't need to again.
-	 */
-	if (prstate.all_frozen)
-		vm_conflict_horizon = InvalidTransactionId;
-	else
-		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 	/*
 	 * During its second pass over the heap, VACUUM calls
@@ -1213,7 +1282,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(presult->lpdead_items == 0);
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
 										prstate.cutoffs->OldestXmin,
@@ -1223,67 +1293,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		Assert(prstate.all_frozen == debug_all_frozen);
 
 		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == vm_conflict_horizon);
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
 #endif
 
-	/* Now update the visibility map and PD_ALL_VISIBLE hint */
-	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
-	do_set_vm = heap_page_will_set_vm(&prstate,
-									  params->relation,
-									  blockno,
-									  buffer,
-									  page,
-									  vmbuffer,
-									  prstate.lpdead_items,
-									  &old_vmbits,
-									  &new_vmbits);
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->hastup = prstate.hastup;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
-	/*
-	 * new_vmbits should be 0 regardless of whether or not the page is
-	 * all-visible if we do not intend to set the VM.
-	 */
-	Assert(do_set_vm || new_vmbits == 0);
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
 
-	/* Set the visibility map and page visibility hint, if relevant */
-	if (do_set_vm)
+	if (prstate.attempt_freeze)
 	{
-		Assert(prstate.all_visible);
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled). Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * The heap buffer must be marked dirty before adding it to the WAL
-		 * chain when setting the VM. We don't worry about unnecessarily
-		 * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
-		 * It is extremely rare to have a clean heap buffer with
-		 * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
-		 * point in optimizing it.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buffer);
-
-		/*
-		 * If the page is being set all-frozen, we pass InvalidTransactionId
-		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
-		 * make everything safe for REDO was logged when the page's tuples
-		 * were frozen.
-		 */
-		Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
-		visibilitymap_set(params->relation, blockno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, vm_conflict_horizon,
-						  new_vmbits);
+		if (presult->nfrozen > 0)
+		{
+			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+		}
+		else
+		{
+			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+		}
 	}
-
-	/* Save the vmbits for caller */
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = new_vmbits;
 }
 
 
-- 
2.43.0



  [text/x-patch] v30-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 8-v30-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From b30b92789f9b62e60348bd1441f03031e1bf7309 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v30 07/16] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f56a02a3d46..d22d2a86ed0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1867,9 +1867,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1886,13 +1889,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v30-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (24.6K, 9-v30-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From fb26088478e331440a2747031ba259e2adc9808e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v30 08/16] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 109 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 370 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6daf4a87dec..fb7a7548aa0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8813,50 +8813,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1823feff298..47d2479415e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 49d3ebb0063..b099483051a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1240,8 +1240,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * so there is no point in optimizing it.
 			 */
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
-									 params->relation->rd_locator);
+			visibilitymap_set(blockno, vmbuffer, new_vmbits,
+							  params->relation->rd_locator);
 		}
 
 		MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d22d2a86ed0..93f0f39c5f0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1889,11 +1889,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2771,9 +2771,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 * set PD_ALL_VISIBLE.
 		 */
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index cdcb475e501..d30fee3a488 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,106 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
@@ -341,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e15cb1825e..c0cac7ea1c3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..b27fcdfb345 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 787c19e5fef..a6580ea6188 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 04845d5e680..6505628120c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4331,7 +4331,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v30-0009-Simplify-heap_page_would_be_all_visible-visibili.patch (2.4K, 10-v30-0009-Simplify-heap_page_would_be_all_visible-visibili.patch)
  download | inline diff:
From 667b2e7c19c70694912223bc35d8f286a439dacd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 13:57:16 -0500
Subject: [PATCH v30 09/16] Simplify heap_page_would_be_all_visible visibility
 check

heap_page_would_be_all_visible() doesn't care about the distinction
between HEAPTUPLE_RECENTLY_DEAD and HEAPTUPLE_DEAD tuples -- any tuple
that is not HEAPTUPLE_LIVE means the page is not all-visible and causes
us to return false.

Therefore, we don't need to call HeapTupleSatisfiesVacuum(), which
includes an extra step to distinguish between dead and recently dead
tuples using OldestXmin. Replace it with the more minimal
HeapTupleSatisfiesVacuumHorizon().

This has the added benefit of making it easier to replace uses of
OldestXmin in heap_page_would_be_all_visible() in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 93f0f39c5f0..e827ca21c68 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	{
 		ItemId		itemid;
 		HeapTupleData tuple;
+		TransactionId dead_after;
 
 		/*
 		 * Set the offset number so that we can display it along with any
@@ -3576,12 +3577,14 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 		/* Visibility checks may do IO or allocate memory */
 		Assert(CritSectionCount == 0);
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumHorizon(&tuple, buf, &dead_after))
 		{
 			case HEAPTUPLE_LIVE:
 				{
 					TransactionId xmin;
 
+					Assert(!TransactionIdIsValid(dead_after));
+
 					/* Check comments in lazy_scan_prune. */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
@@ -3614,8 +3617,10 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				}
 				break;
 
-			case HEAPTUPLE_DEAD:
 			case HEAPTUPLE_RECENTLY_DEAD:
+				Assert(TransactionIdIsValid(dead_after));
+				/* FALLTHROUGH */
+			case HEAPTUPLE_DEAD:
 			case HEAPTUPLE_INSERT_IN_PROGRESS:
 			case HEAPTUPLE_DELETE_IN_PROGRESS:
 				{
-- 
2.43.0



  [text/x-patch] v30-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patch (3.2K, 11-v30-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patch)
  download | inline diff:
From 8025146e100c0433670acb8dafa722b743842e2a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 22 Dec 2025 10:46:45 -0500
Subject: [PATCH v30 10/16] Remove table_scan_analyze_next_tuple unneeded
 parameter OldestXmin

heapam_scan_analyze_next_tuple() doesn't distinguish between dead and
recently dead tuples when counting them, so it doesn't need OldestXmin.

Looking at other table AMs implementing table_scan_analyze_next_tuple(),
it appears most do not use OldestXmin either.

Suggested-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
 src/backend/access/heap/heapam_handler.c | 13 +++++++++----
 src/include/access/tableam.h             |  3 +--
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..8707d1aab4a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1026,7 +1026,7 @@ heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
 }
 
 static bool
-heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+heapam_scan_analyze_next_tuple(TableScanDesc scan,
 							   double *liverows, double *deadrows,
 							   TupleTableSlot *slot)
 {
@@ -1047,6 +1047,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
 		ItemId		itemid;
 		HeapTuple	targtuple = &hslot->base.tupdata;
 		bool		sample_it = false;
+		TransactionId dead_after;
 
 		itemid = PageGetItemId(targpage, hscan->rs_cindex);
 
@@ -1069,16 +1070,20 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
 		targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
 		targtuple->t_len = ItemIdGetLength(itemid);
 
-		switch (HeapTupleSatisfiesVacuum(targtuple, OldestXmin,
-										 hscan->rs_cbuf))
+		switch (HeapTupleSatisfiesVacuumHorizon(targtuple,
+												hscan->rs_cbuf,
+												&dead_after))
 		{
 			case HEAPTUPLE_LIVE:
 				sample_it = true;
 				*liverows += 1;
 				break;
 
-			case HEAPTUPLE_DEAD:
 			case HEAPTUPLE_RECENTLY_DEAD:
+				Assert(TransactionIdIsValid(dead_after));
+				/* FALLTHROUGH */
+
+			case HEAPTUPLE_DEAD:
 				/* Count dead and recently-dead rows */
 				*deadrows += 1;
 				break;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..767f5be838a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -683,7 +683,6 @@ typedef struct TableAmRoutine
 	 * callback).
 	 */
 	bool		(*scan_analyze_next_tuple) (TableScanDesc scan,
-											TransactionId OldestXmin,
 											double *liverows,
 											double *deadrows,
 											TupleTableSlot *slot);
@@ -1718,7 +1717,7 @@ table_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
 							  double *liverows, double *deadrows,
 							  TupleTableSlot *slot)
 {
-	return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan, OldestXmin,
+	return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan,
 															liverows, deadrows,
 															slot);
 }
-- 
2.43.0



  [text/x-patch] v30-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.9K, 12-v30-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From de44d947223fa6c56fb3c75f8c32517068cf05ac Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v30 11/16] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 53 ++++++++++-----------
 src/backend/access/heap/vacuumlazy.c        | 38 ++++++++++-----
 src/include/access/heapam.h                 |  4 +-
 4 files changed, 76 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index bf899c2d2c6..7d9bd28d8f0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1053,6 +1053,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b099483051a..c507231d2a4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -461,11 +461,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -1008,14 +1009,14 @@ heap_page_will_set_vm(PruneState *prstate,
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -1102,6 +1103,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prune_freeze_plan(RelationGetRelid(params->relation),
 					  buffer, &prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1283,10 +1294,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
-										prstate.cutoffs->OldestXmin,
+										prstate.vistest,
 										&debug_all_frozen,
 										&debug_cutoff, off_loc));
 
@@ -1807,28 +1817,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisTestIsRemovableXid
-				 * instead, if a non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e827ca21c68..7463d46891b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2725,7 +2725,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3486,7 +3486,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3502,7 +3502,7 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3585,7 +3585,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 					Assert(!TransactionIdIsValid(dead_after));
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3594,16 +3594,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3634,6 +3635,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 88e79c58a10..5657b1df46b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -438,7 +438,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -452,6 +452,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v30-0012-Unset-all_visible-sooner-if-not-freezing.patch (2.5K, 13-v30-0012-Unset-all_visible-sooner-if-not-freezing.patch)
  download | inline diff:
From d7205c9eb70670edafd5098a5a712f2b3f8ff919 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v30 12/16] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c507231d2a4..8e59e7692c1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1682,8 +1682,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	/*
 	 * Deliberately delay unsetting all_visible and all_frozen until later
 	 * during pruning. Removable dead tuples shouldn't preclude freezing the
-	 * page.
+	 * page. If we won't attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1943,8 +1948,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible and all_frozen until later, at the
 	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
 	 * freeze the page after pruning.  As long as we unset it before updating
-	 * the visibility map, this will be correct.
+	 * the visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all_visible and all_frozen now.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v30-0013-Track-which-relations-are-modified-by-a-query.patch (2.6K, 14-v30-0013-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From d8acbf1885d6cae2a8954ff466e482fbad82ee9c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v30 13/16] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/executor/execMain.c  | 4 ++++
 src/backend/executor/execUtils.c | 2 ++
 src/include/nodes/execnodes.h    | 6 ++++++
 3 files changed, 12 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 797d8b1ca1c..9df7df17e96 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation may be modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3968429f991..13b42b5e6d1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query through a
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v30-0014-Pass-down-information-on-table-modification-to-s.patch (23.7K, 15-v30-0014-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 51cd27b100af830038376794ad72b15b315551af Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v30 14/16] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  7 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 +++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +++----
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++----
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      | 11 ++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              | 19 ++++++++++-------
 23 files changed, 93 insertions(+), 46 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 45d306037a4..5c4bf5f0c6e 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index fc6af7c751b..b2457b96dcc 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 8707d1aab4a..fc251e11f8a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index b7f10a1aed0..15f9cc11582 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
 	}
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index d7695dc1108..7bdbc7e5fa7 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 73ebc01a08f..a00bdfdf822 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index dae91630ac3..1957bb0f1a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b1a00ed477..130a670d266 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6377,7 +6377,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13766,7 +13766,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22623,7 +22623,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23087,7 +23087,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index be6ffd6ddb0..2921f68c1c3 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0b3a31f1703..74262a34819 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 860f79f9cc1..6e49ea5c5d8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 6bea42f128f..2c87ba5f767 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 72b135e5dcf..92674441c6d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 16b0adc172c..91acf1ee2d7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c760b19db55..ec0def0d1e2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 5657b1df46b..ba62a4d4cba 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
 
 	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
 	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 767f5be838a..a7cfb125a5d 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the rel */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -873,9 +875,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -918,9 +920,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
@@ -1127,7 +1129,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1163,9 +1166,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v30-0015-Allow-on-access-pruning-to-set-pages-all-visible.patch (11.0K, 16-v30-0015-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 665f41020eeea237c5538d679ae248161257a87b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v30 15/16] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 15 ++++++-
 src/backend/access/heap/heapam_handler.c      | 15 ++++++-
 src/backend/access/heap/pruneheap.c           | 40 ++++++++++++++++++-
 src/include/access/heapam.h                   | 24 +++++++++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 5 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb7a7548aa0..d9dc79f4a96 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index fc251e11f8a..6946da8c9d7 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2477,6 +2485,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2523,7 +2532,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8e59e7692c1..f414f02964d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
 								  Relation relation,
 								  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
 								  Buffer vmbuffer,
+								  PruneReason reason,
+								  bool do_prune, bool do_freeze,
 								  int nlpdead_items,
 								  uint8 *old_vmbits,
 								  uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 				.cutoffs = NULL,
 			};
 
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
 
@@ -951,6 +964,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
  * corrupted, it will fix them by clearing the VM bits and visibility hint.
  * This does not need to be done in a critical section.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with returning the
  * current value of the VM bits in *old_vmbits and the desired new value of
  * the VM bits in *new_vmbits.
@@ -964,6 +980,8 @@ heap_page_will_set_vm(PruneState *prstate,
 					  Relation relation,
 					  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
 					  Buffer vmbuffer,
+					  PruneReason reason,
+					  bool do_prune, bool do_freeze,
 					  int nlpdead_items,
 					  uint8 *old_vmbits,
 					  uint8 *new_vmbits)
@@ -974,6 +992,24 @@ heap_page_will_set_vm(PruneState *prstate,
 	if (!prstate->attempt_update_vm)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+		return false;
+	}
+
 	*old_vmbits = visibilitymap_get_status(relation, heap_blk,
 										   &vmbuffer);
 
@@ -1171,6 +1207,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  buffer,
 									  page,
 									  vmbuffer,
+									  params->reason,
+									  do_prune, do_freeze,
 									  prstate.lpdead_items,
 									  &old_vmbits,
 									  &new_vmbits);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ba62a4d4cba..b0e7c71463c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 
 	/*
 	 * Some optimizations can only be performed if the query does not modify
@@ -419,7 +436,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v30-0016-Set-pd_prune_xid-on-insert.patch (6.7K, 17-v30-0016-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 6a843b21a4dd795004dfecbd0c321271beae8120 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v30 16/16] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affecting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../modules/index/expected/killtuples.out     |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d9dc79f4a96..ccebc1f244b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 47d2479415e..ab2db931aac 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-22 18:20  Melanie Plageman <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-12-22 18:20 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Xuneng Zhou <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>; Chao Li <[email protected]>

On Sat, Dec 20, 2025 at 7:32 AM Kirill Reshke <[email protected]> wrote:
>
> Hi! I checked v29-0009, about HeapTupleSatisfiesVacuumHorizon. Origins
> of this code track down to fdf9e21196a6 which was committed as part of
> [0], at which point
> there was no HeapTupleSatisfiesVacuumHorizon function. I guess this is
> the reason this optimization was not performed earlier.

Thanks for taking a look into this!

> I also think this patch is correct, because we do similar things for
> HEAPTUPLE_DEAD & HEAPTUPLE_RECENTLY_DEAD, and
> HeapTupleSatisfiesVacuumHorizon is just a proxy to
> HeapTupleSatisfiesVacuumHorizon with only difference in DEAD VS
> RECENTLY_DEAD handling.
>
> Similar change could be done at heapam_scan_analyze_next_tuple
>
> ...
> case HEAPTUPLE_DEAD:
> case HEAPTUPLE_RECENTLY_DEAD:
> /* Count dead and recently-dead rows */
> *deadrows += 1;
> break;

In v30 sent here [1], I did end up making this change in 0010. I just
realized that I should have also changed
table_scan_analyze_next_tuple() and removed the call to
GetOldestRemovableTransactionId(). I've done that in attached v31.

I'm not sure we should change the table AM API (by removing
OldestXmin), though. I looked for table AMs implementing
scan_analyze_next_tuple() to see if they use OldestXmin. I found two:
OrioleDB [2] and Citus columnar [3], which both implement
scan_analyze_next_tuple() and neither of them use OldestXmin. I
couldn't easily find other table AMs implementing
scan_analyze_next_tuple(). I don't have a strong sense of whether or
not I should make this change. Changing it is churn to a public API
and doesn't specifically enable us to do something.

I could also just leave it unused by heapam's implementation. I
haven't checked what, if any, other table AMs callbacks have
parameters completely unused by their heap implementation.

So, I'm on the fence about whether or not to make the change at all,
and, if I do, whether or not to change the table AM callback. That is
done in v31, though, so we can discuss.

- Melanie

[1] https://www.postgresql.org/message-id/CAAKRu_ZCjHoRPfQ8AbMrFY8TOMCPAvZ0_m9SX7yg0edfTk45-g%40mail.gma...
[2] https://github.com/orioledb/orioledb/blob/acff65984d106dabf708a179e2c6694297e08c02/src/tableam/handl...
[3] https://github.com/citusdata/citus/blob/ee3812d267db3ab007efb6f5f432c82c1f448695/src/backend/columna...


Attachments:

  [text/x-patch] v31-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch (10.2K, 2-v31-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch)
  download | inline diff:
From ec1755a3055229d5bef9cc963f8f6b7edb2a1cd3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v31 01/16] Combine visibilitymap_set() cases in
 lazy_scan_prune()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).

In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().

Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would assert out
anyway.

Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.

This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Srinath Reddy Sadipiralla <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Xuneng Zhou <[email protected]>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
Discussion: https://postgr.es/m/flat/CAAKRu_ZWx5gCbeCf7PWCv8p5%3D%3Db7EEws0VD2wksDxpXCvCyHvQ%40mail.gmail.com
---
 .../pg_visibility/expected/pg_visibility.out  | 44 ++++++++++
 contrib/pg_visibility/sql/pg_visibility.sql   | 20 +++++
 src/backend/access/heap/vacuumlazy.c          | 87 ++++---------------
 3 files changed, 82 insertions(+), 69 deletions(-)

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..e10f1706015 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -1,4 +1,5 @@
 CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
 --
 -- recently-dropped table
 --
@@ -204,6 +205,49 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (1,1)
+(1 row)
+
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map 
+----------------------------
+ 
+(1 row)
+
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (0,0)
+(1 row)
+
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+        FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+ ?column? 
+----------
+ t
+(1 row)
+
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary 
+---------------------------
+ (1,1)
+(1 row)
+
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 -- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..57af8a0c5b6 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -1,4 +1,5 @@
 CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
 
 --
 -- recently-dropped table
@@ -94,6 +95,25 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+        FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+
 -- test copy freeze
 create table copyfreeze (a int, b char(1500));
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 30778a15639..cecba2146ea 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2093,16 +2093,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * of last heap_vac_scan_next_block() call), and from all_visible and
 	 * all_frozen variables
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.all_visible && !all_visible_according_to_vm) ||
+		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
 	{
 		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
 		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
 			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
 
 		/*
 		 * It should never be the case that the visibility map page is set
@@ -2110,15 +2108,25 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * checksums are not enabled).  Regardless, set both bits so that we
 		 * get back in sync.
 		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
+		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+		 * unnecessarily dirtying the heap buffer. Nearly the only scenario
+		 * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
+		 * removed -- and that isn't worth optimizing for. And if we add the
+		 * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
+		 * it must be marked dirty.
 		 */
 		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
+
+		/*
+		 * If the page is being set all-frozen, we pass InvalidTransactionId
+		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+		 * make everything safe for REDO was logged when the page's tuples
+		 * were frozen.
+		 */
+		Assert(!presult.all_frozen ||
+			   !TransactionIdIsValid(presult.vm_conflict_horizon));
+
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
@@ -2190,65 +2198,6 @@ lazy_scan_prune(LVRelState *vacrel,
 							VISIBILITYMAP_VALID_BITS);
 	}
 
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
 	return presult.ndeleted;
 }
 
-- 
2.43.0



  [text/x-patch] v31-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch (17.0K, 3-v31-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch)
  download | inline diff:
From f55678510379299ca66cf78fbf6e08ec8ecda0d2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 11 Dec 2025 10:48:13 -0500
Subject: [PATCH v31 02/16] Eliminate use of cached VM value in
 lazy_scan_prune()

lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a365, when we did not pin the
VM page until confirming it was not all-visible. Now that the VM page is
already pinned, there is no meaningful benefit to relying on a cached VM
status.

Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available which would make the logic harder to reason about. Eliminating
it also enables us to detect and repair VM corruption on-access.

Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible that after
fixing corruption, the VM could be newly set, if pruning found the page
all-visible.

Now that no callers of visibilitymap_set() use its return value, change
its (and visibilitymap_set_vmbits()) return type to void.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Xuneng Zhou <[email protected]>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
 src/backend/access/heap/vacuumlazy.c    | 182 +++++++++++-------------
 src/backend/access/heap/visibilitymap.c |   9 +-
 src/include/access/visibilitymap.h      |  18 +--
 3 files changed, 94 insertions(+), 115 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cecba2146ea..d47ed7814c8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -248,13 +248,6 @@ typedef enum
  */
 #define EAGER_SCAN_REGION_SIZE 4096
 
-/*
- * heap_vac_scan_next_block() sets these flags to communicate information
- * about the block it read to the caller.
- */
-#define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
-#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
-
 typedef struct LVRelState
 {
 	/* Target heap relation and its indexes */
@@ -358,7 +351,6 @@ typedef struct LVRelState
 	/* State maintained by heap_vac_scan_next_block() */
 	BlockNumber current_block;	/* last block returned */
 	BlockNumber next_unskippable_block; /* next unskippable block */
-	bool		next_unskippable_allvis;	/* its visibility status */
 	bool		next_unskippable_eager_scanned; /* if it was eagerly scanned */
 	Buffer		next_unskippable_vmbuffer;	/* buffer containing its VM bit */
 
@@ -432,7 +424,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							Buffer vmbuffer, bool all_visible_according_to_vm,
+							Buffer vmbuffer,
 							bool *has_lpdead_items, bool *vm_page_frozen);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
@@ -1248,7 +1240,6 @@ lazy_scan_heap(LVRelState *vacrel)
 	/* Initialize for the first heap_vac_scan_next_block() call */
 	vacrel->current_block = InvalidBlockNumber;
 	vacrel->next_unskippable_block = InvalidBlockNumber;
-	vacrel->next_unskippable_allvis = false;
 	vacrel->next_unskippable_eager_scanned = false;
 	vacrel->next_unskippable_vmbuffer = InvalidBuffer;
 
@@ -1264,13 +1255,13 @@ lazy_scan_heap(LVRelState *vacrel)
 										MAIN_FORKNUM,
 										heap_vac_scan_next_block,
 										vacrel,
-										sizeof(uint8));
+										sizeof(bool));
 
 	while (true)
 	{
 		Buffer		buf;
 		Page		page;
-		uint8		blk_info = 0;
+		bool		was_eager_scanned = false;
 		int			ndeleted = 0;
 		bool		has_lpdead_items;
 		void	   *per_buffer_data = NULL;
@@ -1339,13 +1330,13 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (!BufferIsValid(buf))
 			break;
 
-		blk_info = *((uint8 *) per_buffer_data);
+		was_eager_scanned = *((bool *) per_buffer_data);
 		CheckBufferIsPinnedOnce(buf);
 		page = BufferGetPage(buf);
 		blkno = BufferGetBlockNumber(buf);
 
 		vacrel->scanned_pages++;
-		if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
+		if (was_eager_scanned)
 			vacrel->eager_scanned_pages++;
 
 		/* Report as block scanned, update error traceback information */
@@ -1416,7 +1407,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (got_cleanup_lock)
 			ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
 									   vmbuffer,
-									   blk_info & VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM,
 									   &has_lpdead_items, &vm_page_frozen);
 
 		/*
@@ -1433,8 +1423,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * exclude pages skipped due to cleanup lock contention from eager
 		 * freeze algorithm caps.
 		 */
-		if (got_cleanup_lock &&
-			(blk_info & VAC_BLK_WAS_EAGER_SCANNED))
+		if (got_cleanup_lock && was_eager_scanned)
 		{
 			/* Aggressive vacuums do not eager scan. */
 			Assert(!vacrel->aggressive);
@@ -1601,7 +1590,6 @@ heap_vac_scan_next_block(ReadStream *stream,
 {
 	BlockNumber next_block;
 	LVRelState *vacrel = callback_private_data;
-	uint8		blk_info = 0;
 
 	/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
 	next_block = vacrel->current_block + 1;
@@ -1664,8 +1652,8 @@ heap_vac_scan_next_block(ReadStream *stream,
 		 * otherwise they would've been unskippable.
 		 */
 		vacrel->current_block = next_block;
-		blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
-		*((uint8 *) per_buffer_data) = blk_info;
+		/* Block was not eager scanned */
+		*((bool *) per_buffer_data) = false;
 		return vacrel->current_block;
 	}
 	else
@@ -1677,11 +1665,7 @@ heap_vac_scan_next_block(ReadStream *stream,
 		Assert(next_block == vacrel->next_unskippable_block);
 
 		vacrel->current_block = next_block;
-		if (vacrel->next_unskippable_allvis)
-			blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
-		if (vacrel->next_unskippable_eager_scanned)
-			blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
-		*((uint8 *) per_buffer_data) = blk_info;
+		*((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
 		return vacrel->current_block;
 	}
 }
@@ -1706,7 +1690,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 	BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
 	Buffer		next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
 	bool		next_unskippable_eager_scanned = false;
-	bool		next_unskippable_allvis;
 
 	*skipsallvis = false;
 
@@ -1716,7 +1699,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 													   next_unskippable_block,
 													   &next_unskippable_vmbuffer);
 
-		next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
 
 		/*
 		 * At the start of each eager scan region, normal vacuums with eager
@@ -1735,7 +1717,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 		 * A block is unskippable if it is not all visible according to the
 		 * visibility map.
 		 */
-		if (!next_unskippable_allvis)
+		if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
 		{
 			Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
 			break;
@@ -1792,7 +1774,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 
 	/* write the local variables back to vacrel */
 	vacrel->next_unskippable_block = next_unskippable_block;
-	vacrel->next_unskippable_allvis = next_unskippable_allvis;
 	vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
 	vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
 }
@@ -1953,9 +1934,7 @@ cmpOffsetNumbers(const void *a, const void *b)
  * Caller must hold pin and buffer cleanup lock on the buffer.
  *
  * vmbuffer is the buffer containing the VM block with visibility information
- * for the heap block, blkno. all_visible_according_to_vm is the saved
- * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * for the heap block, blkno.
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
@@ -1972,7 +1951,6 @@ lazy_scan_prune(LVRelState *vacrel,
 				BlockNumber blkno,
 				Page page,
 				Buffer vmbuffer,
-				bool all_visible_according_to_vm,
 				bool *has_lpdead_items,
 				bool *vm_page_frozen)
 {
@@ -1986,6 +1964,8 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
+	uint8		old_vmbits = 0;
+	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2088,70 +2068,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 	Assert(!presult.all_frozen || presult.all_visible);
 
-	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
-	 */
-	if ((presult.all_visible && !all_visible_according_to_vm) ||
-		(presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
-		 * unnecessarily dirtying the heap buffer. Nearly the only scenario
-		 * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
-		 * removed -- and that isn't worth optimizing for. And if we add the
-		 * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
-		 * it must be marked dirty.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-
-		/*
-		 * If the page is being set all-frozen, we pass InvalidTransactionId
-		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
-		 * make everything safe for REDO was logged when the page's tuples
-		 * were frozen.
-		 */
-		Assert(!presult.all_frozen ||
-			   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
+	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
 
 	/*
 	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
@@ -2159,8 +2076,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
 	 * with buffer lock before concluding that the VM is corrupt.
 	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	if (!PageIsAllVisible(page) &&
+		(old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
 	{
 		ereport(WARNING,
 				(errcode(ERRCODE_DATA_CORRUPTED),
@@ -2169,6 +2086,8 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
 							VISIBILITYMAP_VALID_BITS);
+		/* VM bits are now clear */
+		old_vmbits = 0;
 	}
 
 	/*
@@ -2196,6 +2115,71 @@ lazy_scan_prune(LVRelState *vacrel,
 		MarkBufferDirty(buf);
 		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
 							VISIBILITYMAP_VALID_BITS);
+		/* VM bits are now clear */
+		old_vmbits = 0;
+	}
+
+	if (!presult.all_visible)
+		return presult.ndeleted;
+
+	/* Set the visibility map and page visibility hint */
+	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (presult.all_frozen)
+		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	/* Nothing to do */
+	if (old_vmbits == new_vmbits)
+		return presult.ndeleted;
+
+	Assert(presult.all_visible);
+
+	/*
+	 * It should never be the case that the visibility map page is set while
+	 * the page-level bit is clear, but the reverse is allowed (if checksums
+	 * are not enabled). Regardless, set both bits so that we get back in
+	 * sync.
+	 *
+	 * The heap buffer must be marked dirty before adding it to the WAL chain
+	 * when setting the VM. We don't worry about unnecessarily dirtying the
+	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
+	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
+	 * the VM bits clear, so there is no point in optimizing it.
+	 */
+	PageSetAllVisible(page);
+	MarkBufferDirty(buf);
+
+	/*
+	 * If the page is being set all-frozen, we pass InvalidTransactionId as
+	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
+	 * everything safe for REDO was logged when the page's tuples were frozen.
+	 */
+	Assert(!presult.all_frozen ||
+		   !TransactionIdIsValid(presult.vm_conflict_horizon));
+
+	visibilitymap_set(vacrel->rel, blkno, buf,
+					  InvalidXLogRecPtr,
+					  vmbuffer, presult.vm_conflict_horizon,
+					  new_vmbits);
+
+	/*
+	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
+	 * count it as newly set for logging.
+	 */
+	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+	{
+		vacrel->vm_new_visible_pages++;
+		if (presult.all_frozen)
+		{
+			vacrel->vm_new_visible_frozen_pages++;
+			*vm_page_frozen = true;
+		}
+	}
+	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 presult.all_frozen)
+	{
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
 	return presult.ndeleted;
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..cdcb475e501 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -240,10 +240,8 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
  * You must pass a buffer containing the correct map page to this function.
  * Call visibilitymap_pin first to pin the right one. This function doesn't do
  * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
  */
-uint8
+void
 visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
 				  uint8 flags)
@@ -320,7 +318,6 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	}
 
 	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
 }
 
 /*
@@ -343,7 +340,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  *
  * rlocator is used only for debugging messages.
  */
-uint8
+void
 visibilitymap_set_vmbits(BlockNumber heapBlk,
 						 Buffer vmBuf, uint8 flags,
 						 const RelFileLocator rlocator)
@@ -386,8 +383,6 @@ visibilitymap_set_vmbits(BlockNumber heapBlk,
 		map[mapByte] |= (flags << mapOffset);
 		MarkBufferDirty(vmBuf);
 	}
-
-	return status;
 }
 
 /*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..787c19e5fef 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -32,15 +32,15 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const RelFileLocator rlocator);
+extern void visibilitymap_set(Relation rel,
+							  BlockNumber heapBlk, Buffer heapBuf,
+							  XLogRecPtr recptr,
+							  Buffer vmBuf,
+							  TransactionId cutoff_xid,
+							  uint8 flags);
+extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
+									 Buffer vmBuf, uint8 flags,
+									 const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v31-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch (6.7K, 4-v31-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch)
  download | inline diff:
From d196bdeefae2b14ca3b7abf22b6d6cffca116cd4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v31 03/16] Refactor lazy_scan_prune() VM clear logic into
 helper

Encapsulating them in a helper makes the whole function clearer. There
is no functional change other than moving it into a helper.

Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 132 +++++++++++++++++----------
 1 file changed, 85 insertions(+), 47 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d47ed7814c8..c5fc5b71f94 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,6 +422,11 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+										   BlockNumber heap_blk, Page heap_page,
+										   int nlpdead_items,
+										   Buffer vmbuffer,
+										   uint8 vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1928,6 +1933,83 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
+/*
+ * Helper to correct any corruption detected on a heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+							   BlockNumber heap_blk, Page heap_page,
+							   int nlpdead_items,
+							   Buffer vmbuffer,
+							   uint8 vmbits)
+{
+	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (!PageIsAllVisible(heap_page) &&
+		((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2070,54 +2152,10 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
 
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(page) &&
-		(old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		/* VM bits are now clear */
+	if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+									   presult.lpdead_items, vmbuffer,
+									   old_vmbits))
 		old_vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		/* VM bits are now clear */
-		old_vmbits = 0;
-	}
 
 	if (!presult.all_visible)
 		return presult.ndeleted;
-- 
2.43.0



  [text/x-patch] v31-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch (26.8K, 5-v31-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch)
  download | inline diff:
From 566794eed6786868a1147e6a0436d74c0603ccdf Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v31 04/16] Set the VM in heap_page_prune_and_freeze()

This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 315 +++++++++++++++++++++++----
 src/backend/access/heap/vacuumlazy.c | 150 +------------
 src/include/access/heapam.h          |  20 ++
 3 files changed, 299 insertions(+), 186 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 07aa08cfe14..1c1446058a7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+										   BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+										   Buffer vmbuffer,
+										   uint8 vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+								  Relation relation,
+								  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+								  Buffer vmbuffer,
+								  int nlpdead_items,
+								  uint8 *old_vmbits,
+								  uint8 *new_vmbits);
 
 
 /*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeParams params = {
 				.relation = relation,
 				.buffer = buffer,
+				.vmbuffer = InvalidBuffer,
 				.reason = PRUNE_ON_ACCESS,
 				.options = 0,
 				.vistest = vistest,
@@ -341,6 +356,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 
 	/*
@@ -396,51 +413,54 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
-	 * that after scanning the line pointers. We must correct all_visible and
-	 * all_frozen before we return them to the caller, so that the caller
-	 * doesn't set the VM bits incorrectly.
+	 * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+	 * them after deciding whether to freeze, but before updating the VM, to
+	 * avoid setting the VM bits incorrectly.
+	 *
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate->attempt_freeze)
 	{
 		prstate->all_visible = true;
 		prstate->all_frozen = true;
 	}
+	else if (prstate->attempt_update_vm)
+	{
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate->all_visible = false;
 		prstate->all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -775,10 +795,148 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Helper to correct any corruption detected on a heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+							   BlockNumber heap_blk, Page heap_page,
+							   int nlpdead_items,
+							   Buffer vmbuffer,
+							   uint8 vmbits)
+{
+	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	if (!PageIsAllVisible(heap_page) &&
+		((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(rel), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buffer);
+		visibilitymap_clear(rel, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ *
+ * If the VM should not be set, it returns false. If we won't consider
+ * updating the VM, *old_vmbits will be 0, regardless of the current value of
+ * the VM bits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+					  Relation relation,
+					  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+					  Buffer vmbuffer,
+					  int nlpdead_items,
+					  uint8 *old_vmbits,
+					  uint8 *new_vmbits)
+{
+	*old_vmbits = 0;
+	*new_vmbits = 0;
+
+	if (!prstate->attempt_update_vm)
+		return false;
+
+	*old_vmbits = visibilitymap_get_status(relation, heap_blk,
+										   &vmbuffer);
+
+	/* We do this even if not all-visible */
+	if (identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+									   nlpdead_items, vmbuffer,
+									   *old_vmbits))
+		*old_vmbits = 0;
+
+	if (!prstate->all_visible)
+		return false;
+
+	*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->all_frozen)
+		*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (*new_vmbits == *old_vmbits)
+	{
+		*new_vmbits = 0;
+		return false;
+	}
+
+	return true;
+}
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -793,12 +951,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -823,13 +982,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	uint8		new_vmbits;
+	uint8		old_vmbits;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1011,6 +1175,65 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
 		}
 	}
+
+	/* Now update the visibility map and PD_ALL_VISIBLE hint */
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate,
+									  params->relation,
+									  blockno,
+									  buffer,
+									  page,
+									  vmbuffer,
+									  prstate.lpdead_items,
+									  &old_vmbits,
+									  &new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	/* Set the visibility map and page visibility hint, if relevant */
+	if (do_set_vm)
+	{
+		Assert(prstate.all_visible);
+
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed (if
+		 * checksums are not enabled). Regardless, set both bits so that we
+		 * get back in sync.
+		 *
+		 * The heap buffer must be marked dirty before adding it to the WAL
+		 * chain when setting the VM. We don't worry about unnecessarily
+		 * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
+		 * It is extremely rare to have a clean heap buffer with
+		 * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
+		 * point in optimizing it.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buffer);
+
+		/*
+		 * If the page is being set all-frozen, we pass InvalidTransactionId
+		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+		 * make everything safe for REDO was logged when the page's tuples
+		 * were frozen.
+		 */
+		Assert(!prstate.all_frozen ||
+			   !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+		visibilitymap_set(params->relation, blockno, buffer,
+						  InvalidXLogRecPtr,
+						  vmbuffer, presult->vm_conflict_horizon,
+						  new_vmbits);
+	}
+
+	/* Save the vmbits for caller */
+	presult->old_vmbits = old_vmbits;
+	presult->new_vmbits = new_vmbits;
 }
 
 
@@ -1485,6 +1708,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			{
 				TransactionId xmin;
 
+				Assert(prstate->attempt_update_vm);
+
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c5fc5b71f94..8b489349312 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,11 +422,7 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 vmbits);
+
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1933,83 +1929,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * Returns true if it cleared corruption and false otherwise.
- */
-static bool
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		return true;
-	}
-
-	return false;
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2041,13 +1960,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2147,75 +2065,25 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-									   presult.lpdead_items, vmbuffer,
-									   old_vmbits))
-		old_vmbits = 0;
-
-	if (!presult.all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
-	if (presult.all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	Assert(presult.all_visible);
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear, but the reverse is allowed (if checksums
-	 * are not enabled). Regardless, set both bits so that we get back in
-	 * sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
 
 	/*
 	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
 	 * count it as newly set for logging.
 	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
 		vacrel->vm_new_visible_pages++;
-		if (presult.all_frozen)
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
 			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.all_frozen)
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
 		vacrel->vm_new_frozen_pages++;
 		*vm_page_frozen = true;
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f7e4ae3843c..0913759219c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * If we will consider updating the visibility map, vmbuffer should
+	 * contain the correct block of the visibility map and be pinned.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
 	 *
 	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
 	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 *
+	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+	 * in the VM.
 	 */
 	int			options;
 
@@ -299,6 +309,16 @@ typedef struct PruneFreezeResult
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
+	 *
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
+	 */
+	uint8		new_vmbits;
+	uint8		old_vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v31-0005-Move-VM-assert-into-prune-freeze-code.patch (10.9K, 6-v31-0005-Move-VM-assert-into-prune-freeze-code.patch)
  download | inline diff:
From 9f5072500e2a3bc2f2a8490f1ca11bf60a81515a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v31 05/16] Move VM assert into prune/freeze code

This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 86 ++++++++++++++++++++++------
 src/backend/access/heap/vacuumlazy.c | 68 +---------------------
 src/include/access/heapam.h          | 25 +++-----
 3 files changed, 77 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1c1446058a7..7af6aea2d0e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -932,6 +932,31 @@ heap_page_will_set_vm(PruneState *prstate,
 	return true;
 }
 
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+#endif
+
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -985,6 +1010,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -1142,23 +1168,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
@@ -1176,6 +1187,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		}
 	}
 
+	/*
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so we don't need to again.
+	 */
+	if (prstate.all_frozen)
+		vm_conflict_horizon = InvalidTransactionId;
+	else
+		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(presult->lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(params->relation, buffer,
+										prstate.cutoffs->OldestXmin,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == vm_conflict_horizon);
+	}
+#endif
+
 	/* Now update the visibility map and PD_ALL_VISIBLE hint */
 	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
 
@@ -1222,12 +1273,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * make everything safe for REDO was logged when the page's tuples
 		 * were frozen.
 		 */
-		Assert(!prstate.all_frozen ||
-			   !TransactionIdIsValid(presult->vm_conflict_horizon));
+		Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
 
 		visibilitymap_set(params->relation, blockno, buffer,
 						  InvalidXLogRecPtr,
-						  vmbuffer, presult->vm_conflict_horizon,
+						  vmbuffer, vm_conflict_horizon,
 						  new_vmbits);
 	}
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8b489349312..f56a02a3d46 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -457,20 +457,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
-										   OffsetNumber *deadoffsets,
-										   int ndeadoffsets,
-										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
-										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2006,32 +1992,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -3489,29 +3449,6 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
-{
-
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
-										  NULL, 0,
-										  all_frozen,
-										  visibility_cutoff_xid,
-										  logging_offnum);
-}
-#endif
 
 /*
  * Check whether the heap page in buf is all-visible except for the dead
@@ -3535,15 +3472,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
  * This logic is closely related to heap_prune_record_unchanged_lp_normal().
  * If you modify this function, ensure consistency with that code. An
  * assertion cross-checks that both remain in agreement. Do not introduce new
  * side-effects.
  */
-static bool
+bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0913759219c..88e79c58a10 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 *
 	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
 	 * in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
 	int			live_tuples;
 	int			recently_dead_tuples;
 
-	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
-	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
-
 	/*
 	 * old_vmbits are the state of the all-visible and all-frozen bits in the
 	 * visibility map before updating it during phase I of vacuuming.
@@ -453,6 +437,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v31-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (14.2K, 7-v31-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From eb94a7df040b6250d3ea3e0d1a79f24a3dc4fd6a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v31 06/16] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c | 275 ++++++++++++++++------------
 1 file changed, 157 insertions(+), 118 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7af6aea2d0e..49d3ebb0063 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
 								  int nlpdead_items,
 								  uint8 *old_vmbits,
 								  uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  uint8 old_vmbits, uint8 new_vmbits,
+									  TransactionId latest_xid_removed,
+									  TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid);
 
 
 /*
@@ -795,6 +800,68 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 uint8 old_vmbits, uint8 new_vmbits,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid)
+{
+	TransactionId conflict_xid;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune &&
+		!do_freeze &&
+		do_set_vm &&
+		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+		(new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	return conflict_xid;
+}
+
 /*
  * Helper to correct any corruption detected on a heap page and its
  * corresponding visibility map page after pruning but before setting the
@@ -1010,7 +1077,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -1018,6 +1084,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid;
 	uint8		new_vmbits;
 	uint8		old_vmbits;
 
@@ -1081,6 +1148,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the VM bits based on information from the VM and
+	 * the all_visible/all_frozen flags.
+	 */
+	do_set_vm = heap_page_will_set_vm(&prstate,
+									  params->relation,
+									  blockno,
+									  buffer,
+									  page,
+									  vmbuffer,
+									  prstate.lpdead_items,
+									  &old_vmbits,
+									  &new_vmbits);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									old_vmbits, new_vmbits,
+									prstate.latest_xid_removed,
+									prstate.frz_conflict_horizon,
+									prstate.visibility_cutoff_xid);
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1102,14 +1200,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1123,6 +1224,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(page);
+			visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+									 params->relation->rd_locator);
+		}
+
 		MarkBufferDirty(buffer);
 
 		/*
@@ -1130,29 +1251,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(params->relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1162,43 +1266,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->hastup = prstate.hastup;
-
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
-	if (prstate.attempt_freeze)
-	{
-		if (presult->nfrozen > 0)
-		{
-			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
-		}
-		else
-		{
-			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
-		}
-	}
-
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so we don't need to again.
-	 */
-	if (prstate.all_frozen)
-		vm_conflict_horizon = InvalidTransactionId;
-	else
-		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 	/*
 	 * During its second pass over the heap, VACUUM calls
@@ -1213,7 +1282,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(presult->lpdead_items == 0);
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
 										prstate.cutoffs->OldestXmin,
@@ -1223,67 +1293,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		Assert(prstate.all_frozen == debug_all_frozen);
 
 		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == vm_conflict_horizon);
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
 #endif
 
-	/* Now update the visibility map and PD_ALL_VISIBLE hint */
-	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
-	do_set_vm = heap_page_will_set_vm(&prstate,
-									  params->relation,
-									  blockno,
-									  buffer,
-									  page,
-									  vmbuffer,
-									  prstate.lpdead_items,
-									  &old_vmbits,
-									  &new_vmbits);
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->hastup = prstate.hastup;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
-	/*
-	 * new_vmbits should be 0 regardless of whether or not the page is
-	 * all-visible if we do not intend to set the VM.
-	 */
-	Assert(do_set_vm || new_vmbits == 0);
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
 
-	/* Set the visibility map and page visibility hint, if relevant */
-	if (do_set_vm)
+	if (prstate.attempt_freeze)
 	{
-		Assert(prstate.all_visible);
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled). Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * The heap buffer must be marked dirty before adding it to the WAL
-		 * chain when setting the VM. We don't worry about unnecessarily
-		 * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
-		 * It is extremely rare to have a clean heap buffer with
-		 * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
-		 * point in optimizing it.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buffer);
-
-		/*
-		 * If the page is being set all-frozen, we pass InvalidTransactionId
-		 * as the cutoff_xid, since a snapshot conflict horizon sufficient to
-		 * make everything safe for REDO was logged when the page's tuples
-		 * were frozen.
-		 */
-		Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
-		visibilitymap_set(params->relation, blockno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, vm_conflict_horizon,
-						  new_vmbits);
+		if (presult->nfrozen > 0)
+		{
+			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+		}
+		else
+		{
+			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+		}
 	}
-
-	/* Save the vmbits for caller */
-	presult->old_vmbits = old_vmbits;
-	presult->new_vmbits = new_vmbits;
 }
 
 
-- 
2.43.0



  [text/x-patch] v31-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 8-v31-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From b30b92789f9b62e60348bd1441f03031e1bf7309 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v31 07/16] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f56a02a3d46..d22d2a86ed0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1867,9 +1867,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1886,13 +1889,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v31-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (24.6K, 9-v31-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From fb26088478e331440a2747031ba259e2adc9808e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v31 08/16] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 109 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 370 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6daf4a87dec..fb7a7548aa0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8813,50 +8813,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1823feff298..47d2479415e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 49d3ebb0063..b099483051a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1240,8 +1240,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * so there is no point in optimizing it.
 			 */
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
-									 params->relation->rd_locator);
+			visibilitymap_set(blockno, vmbuffer, new_vmbits,
+							  params->relation->rd_locator);
 		}
 
 		MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d22d2a86ed0..93f0f39c5f0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1889,11 +1889,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2771,9 +2771,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 * set PD_ALL_VISIBLE.
 		 */
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index cdcb475e501..d30fee3a488 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,106 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
@@ -341,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e15cb1825e..c0cac7ea1c3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..b27fcdfb345 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 787c19e5fef..a6580ea6188 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 04845d5e680..6505628120c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4331,7 +4331,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v31-0009-Simplify-heap_page_would_be_all_visible-visibili.patch (2.4K, 10-v31-0009-Simplify-heap_page_would_be_all_visible-visibili.patch)
  download | inline diff:
From 667b2e7c19c70694912223bc35d8f286a439dacd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 13:57:16 -0500
Subject: [PATCH v31 09/16] Simplify heap_page_would_be_all_visible visibility
 check

heap_page_would_be_all_visible() doesn't care about the distinction
between HEAPTUPLE_RECENTLY_DEAD and HEAPTUPLE_DEAD tuples -- any tuple
that is not HEAPTUPLE_LIVE means the page is not all-visible and causes
us to return false.

Therefore, we don't need to call HeapTupleSatisfiesVacuum(), which
includes an extra step to distinguish between dead and recently dead
tuples using OldestXmin. Replace it with the more minimal
HeapTupleSatisfiesVacuumHorizon().

This has the added benefit of making it easier to replace uses of
OldestXmin in heap_page_would_be_all_visible() in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 93f0f39c5f0..e827ca21c68 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	{
 		ItemId		itemid;
 		HeapTupleData tuple;
+		TransactionId dead_after;
 
 		/*
 		 * Set the offset number so that we can display it along with any
@@ -3576,12 +3577,14 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 		/* Visibility checks may do IO or allocate memory */
 		Assert(CritSectionCount == 0);
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumHorizon(&tuple, buf, &dead_after))
 		{
 			case HEAPTUPLE_LIVE:
 				{
 					TransactionId xmin;
 
+					Assert(!TransactionIdIsValid(dead_after));
+
 					/* Check comments in lazy_scan_prune. */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
@@ -3614,8 +3617,10 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				}
 				break;
 
-			case HEAPTUPLE_DEAD:
 			case HEAPTUPLE_RECENTLY_DEAD:
+				Assert(TransactionIdIsValid(dead_after));
+				/* FALLTHROUGH */
+			case HEAPTUPLE_DEAD:
 			case HEAPTUPLE_INSERT_IN_PROGRESS:
 			case HEAPTUPLE_DELETE_IN_PROGRESS:
 				{
-- 
2.43.0



  [text/x-patch] v31-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patch (4.6K, 11-v31-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patch)
  download | inline diff:
From 365b928d060f7248e209a4e26ff914da41178730 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 22 Dec 2025 10:46:45 -0500
Subject: [PATCH v31 10/16] Remove table_scan_analyze_next_tuple unneeded
 parameter OldestXmin

heapam_scan_analyze_next_tuple() doesn't distinguish between dead and
recently dead tuples when counting them, so it doesn't need OldestXmin.

Looking at other table AMs implementing table_scan_analyze_next_tuple(),
it appears most do not use OldestXmin either.

Suggested-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
 src/backend/access/heap/heapam_handler.c | 13 +++++++++----
 src/backend/commands/analyze.c           |  6 +-----
 src/include/access/tableam.h             |  5 ++---
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..8707d1aab4a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1026,7 +1026,7 @@ heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
 }
 
 static bool
-heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+heapam_scan_analyze_next_tuple(TableScanDesc scan,
 							   double *liverows, double *deadrows,
 							   TupleTableSlot *slot)
 {
@@ -1047,6 +1047,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
 		ItemId		itemid;
 		HeapTuple	targtuple = &hslot->base.tupdata;
 		bool		sample_it = false;
+		TransactionId dead_after;
 
 		itemid = PageGetItemId(targpage, hscan->rs_cindex);
 
@@ -1069,16 +1070,20 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
 		targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
 		targtuple->t_len = ItemIdGetLength(itemid);
 
-		switch (HeapTupleSatisfiesVacuum(targtuple, OldestXmin,
-										 hscan->rs_cbuf))
+		switch (HeapTupleSatisfiesVacuumHorizon(targtuple,
+												hscan->rs_cbuf,
+												&dead_after))
 		{
 			case HEAPTUPLE_LIVE:
 				sample_it = true;
 				*liverows += 1;
 				break;
 
-			case HEAPTUPLE_DEAD:
 			case HEAPTUPLE_RECENTLY_DEAD:
+				Assert(TransactionIdIsValid(dead_after));
+				/* FALLTHROUGH */
+
+			case HEAPTUPLE_DEAD:
 				/* Count dead and recently-dead rows */
 				*deadrows += 1;
 				break;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 5e2a7a8234e..184bc3dd3b2 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1213,7 +1213,6 @@ acquire_sample_rows(Relation onerel, int elevel,
 	double		rowstoskip = -1;	/* -1 means not set yet */
 	uint32		randseed;		/* Seed for block sampler(s) */
 	BlockNumber totalblocks;
-	TransactionId OldestXmin;
 	BlockSamplerData bs;
 	ReservoirStateData rstate;
 	TupleTableSlot *slot;
@@ -1226,9 +1225,6 @@ acquire_sample_rows(Relation onerel, int elevel,
 
 	totalblocks = RelationGetNumberOfBlocks(onerel);
 
-	/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
-	OldestXmin = GetOldestNonRemovableTransactionId(onerel);
-
 	/* Prepare for sampling block numbers */
 	randseed = pg_prng_uint32(&pg_global_prng_state);
 	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, randseed);
@@ -1261,7 +1257,7 @@ acquire_sample_rows(Relation onerel, int elevel,
 	{
 		vacuum_delay_point(true);
 
-		while (table_scan_analyze_next_tuple(scan, OldestXmin, &liverows, &deadrows, slot))
+		while (table_scan_analyze_next_tuple(scan, &liverows, &deadrows, slot))
 		{
 			/*
 			 * The first targrows sample rows are simply copied into the
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..ee9b32c4620 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -683,7 +683,6 @@ typedef struct TableAmRoutine
 	 * callback).
 	 */
 	bool		(*scan_analyze_next_tuple) (TableScanDesc scan,
-											TransactionId OldestXmin,
 											double *liverows,
 											double *deadrows,
 											TupleTableSlot *slot);
@@ -1714,11 +1713,11 @@ table_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
  * tuples.
  */
 static inline bool
-table_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+table_scan_analyze_next_tuple(TableScanDesc scan,
 							  double *liverows, double *deadrows,
 							  TupleTableSlot *slot)
 {
-	return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan, OldestXmin,
+	return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan,
 															liverows, deadrows,
 															slot);
 }
-- 
2.43.0



  [text/x-patch] v31-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.9K, 12-v31-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From dc7f28b35a07c637d0bad46194816773217098b1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v31 11/16] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 53 ++++++++++-----------
 src/backend/access/heap/vacuumlazy.c        | 38 ++++++++++-----
 src/include/access/heapam.h                 |  4 +-
 4 files changed, 76 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index bf899c2d2c6..7d9bd28d8f0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1053,6 +1053,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b099483051a..c507231d2a4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -461,11 +461,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -1008,14 +1009,14 @@ heap_page_will_set_vm(PruneState *prstate,
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -1102,6 +1103,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prune_freeze_plan(RelationGetRelid(params->relation),
 					  buffer, &prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1283,10 +1294,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
-										prstate.cutoffs->OldestXmin,
+										prstate.vistest,
 										&debug_all_frozen,
 										&debug_cutoff, off_loc));
 
@@ -1807,28 +1817,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisTestIsRemovableXid
-				 * instead, if a non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e827ca21c68..7463d46891b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2725,7 +2725,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3486,7 +3486,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3502,7 +3502,7 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3585,7 +3585,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 					Assert(!TransactionIdIsValid(dead_after));
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3594,16 +3594,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3634,6 +3635,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 88e79c58a10..5657b1df46b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -438,7 +438,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -452,6 +452,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v31-0012-Unset-all_visible-sooner-if-not-freezing.patch (2.5K, 13-v31-0012-Unset-all_visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 835c3a565d2148fc6a0d79c37de70b7c586edbff Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v31 12/16] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c507231d2a4..8e59e7692c1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1682,8 +1682,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	/*
 	 * Deliberately delay unsetting all_visible and all_frozen until later
 	 * during pruning. Removable dead tuples shouldn't preclude freezing the
-	 * page.
+	 * page. If we won't attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1943,8 +1948,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible and all_frozen until later, at the
 	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
 	 * freeze the page after pruning.  As long as we unset it before updating
-	 * the visibility map, this will be correct.
+	 * the visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all_visible and all_frozen now.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v31-0013-Track-which-relations-are-modified-by-a-query.patch (2.6K, 14-v31-0013-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From da8d2f225d0fc42bdecad87f55a7e86518c068cc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v31 13/16] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/executor/execMain.c  | 4 ++++
 src/backend/executor/execUtils.c | 2 ++
 src/include/nodes/execnodes.h    | 6 ++++++
 3 files changed, 12 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 797d8b1ca1c..9df7df17e96 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation may be modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3968429f991..13b42b5e6d1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query through a
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v31-0014-Pass-down-information-on-table-modification-to-s.patch (23.7K, 15-v31-0014-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 30c13a767fb0d9604595cda0fe3fce238a54df5e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v31 14/16] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  7 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 +++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +++----
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++----
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      | 11 ++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              | 19 ++++++++++-------
 23 files changed, 93 insertions(+), 46 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 45d306037a4..5c4bf5f0c6e 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index fc6af7c751b..b2457b96dcc 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 8707d1aab4a..fc251e11f8a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index b7f10a1aed0..15f9cc11582 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
 	}
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index d7695dc1108..7bdbc7e5fa7 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 73ebc01a08f..a00bdfdf822 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index dae91630ac3..1957bb0f1a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b1a00ed477..130a670d266 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6377,7 +6377,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13766,7 +13766,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22623,7 +22623,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23087,7 +23087,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index be6ffd6ddb0..2921f68c1c3 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0b3a31f1703..74262a34819 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 860f79f9cc1..6e49ea5c5d8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 6bea42f128f..2c87ba5f767 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 72b135e5dcf..92674441c6d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 16b0adc172c..91acf1ee2d7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c760b19db55..ec0def0d1e2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 5657b1df46b..ba62a4d4cba 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
 
 	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
 	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index ee9b32c4620..15fad66ed87 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the rel */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -873,9 +875,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -918,9 +920,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
@@ -1127,7 +1129,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1163,9 +1166,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v31-0015-Allow-on-access-pruning-to-set-pages-all-visible.patch (11.0K, 16-v31-0015-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 8ff462278736c7fa1de096f43e805a92c68a5b07 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v31 15/16] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 15 ++++++-
 src/backend/access/heap/heapam_handler.c      | 15 ++++++-
 src/backend/access/heap/pruneheap.c           | 40 ++++++++++++++++++-
 src/include/access/heapam.h                   | 24 +++++++++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 5 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb7a7548aa0..d9dc79f4a96 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index fc251e11f8a..6946da8c9d7 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2477,6 +2485,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2523,7 +2532,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8e59e7692c1..f414f02964d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
 								  Relation relation,
 								  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
 								  Buffer vmbuffer,
+								  PruneReason reason,
+								  bool do_prune, bool do_freeze,
 								  int nlpdead_items,
 								  uint8 *old_vmbits,
 								  uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 				.cutoffs = NULL,
 			};
 
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
 
@@ -951,6 +964,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
  * corrupted, it will fix them by clearing the VM bits and visibility hint.
  * This does not need to be done in a critical section.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with returning the
  * current value of the VM bits in *old_vmbits and the desired new value of
  * the VM bits in *new_vmbits.
@@ -964,6 +980,8 @@ heap_page_will_set_vm(PruneState *prstate,
 					  Relation relation,
 					  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
 					  Buffer vmbuffer,
+					  PruneReason reason,
+					  bool do_prune, bool do_freeze,
 					  int nlpdead_items,
 					  uint8 *old_vmbits,
 					  uint8 *new_vmbits)
@@ -974,6 +992,24 @@ heap_page_will_set_vm(PruneState *prstate,
 	if (!prstate->attempt_update_vm)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+		return false;
+	}
+
 	*old_vmbits = visibilitymap_get_status(relation, heap_blk,
 										   &vmbuffer);
 
@@ -1171,6 +1207,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  buffer,
 									  page,
 									  vmbuffer,
+									  params->reason,
+									  do_prune, do_freeze,
 									  prstate.lpdead_items,
 									  &old_vmbits,
 									  &new_vmbits);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ba62a4d4cba..b0e7c71463c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 
 	/*
 	 * Some optimizations can only be performed if the query does not modify
@@ -419,7 +436,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v31-0016-Set-pd_prune_xid-on-insert.patch (6.7K, 17-v31-0016-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From d7af575ebac98821654d7cc57091d1273f2b1d86 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v31 16/16] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affecting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../modules/index/expected/killtuples.out     |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d9dc79f4a96..ccebc1f244b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 47d2479415e..ab2db931aac 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-23 00:00  Chao Li <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Chao Li @ 2025-12-23 00:00 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Xuneng Zhou <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>



> On Dec 23, 2025, at 01:57, Melanie Plageman <[email protected]> wrote:
> 
> On Mon, Dec 22, 2025 at 2:20 AM Chao Li <[email protected]> wrote:
>> 
>> A few more comments on v29:
> 
> Thanks for the continued review! I've attached v30.
> 
>> 1 - 0002 - Looks like since 0002, visibilitymap_set()’s return value is no longer used, so do we need to update the function and change return type to void? I remember in some patches, to address Coverity alerts, people had to do “(void) function_with_a_return_value()”.
> 
> I was torn about whether or not to change the return value. Coverity
> doesn't always warn about unused return values. Usually it warns if it
> perceives the return value as needed for error checking or if it
> thinks not using the return value is incorrect. It may still warn in
> this case, but it's not obvious to me which way it would go.
> 
> I have changed the function signature as you suggested in v30.
> 
> My hesitation is that visibilitymap_set() is in a header file and
> could be used by extensions/forks, etc. Adding more information by
> changing a return value from void to non-void doesn't have any
> negative effect on those potential callers. But taking away a return
> value is more likely to affect them in a potentially negative way.
> 
> However, I'm significantly changing the signature in this release, so
> everybody that used it will have to change their code completely
> anyway. Also, I just added a return value for visibilitymap_set() in
> the previous release (18). Historically, it returned void. So, I've
> gone with your suggestion.

From a previous patch, I learned from Peter Eisentraut that “We don't care about ABI changes in major releases.”, see:

https://www.postgresql.org/message-id/70913dbd-dadf-4560-9f81-c0df72bf6578%40eisentraut.org

>> -        * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
>> -        * will return 'all_visible', 'all_frozen' flags to the caller.
>> +        * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
>> 
>> Nit: a tailing dot is needed in the end of the comment line.
> 
> I've changed it. One interesting thing is that our "policy" for
> periods in comments is that we don't put periods at the end of
> one-line comments and we do put them at the end of mult-line comment
> sentences. This is a one-line comment inside a comment block, so I
> wasn't sure what to do. If you noticed it, and it bothered you, it's
> easy enough to change, though.

If this is a one-line comment, I would have not been caring about the tailing period.

The problem is this is a paragraph of a block comment, and the above and below paragraphs all have tailing periods. So, for consistency, I raised the comment.
```
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.   <=== Has a tailing period
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples <=== Not a tailing period
 	 *
 	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
 	 * in the VM.                                 <=== Has a tailing period
```

> 
>> 9 - 0006
>> 
>> @@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
>>        {
>>                ItemId          itemid;
>>                HeapTupleData tuple;
>> +               TransactionId dead_after = InvalidTransactionId;
>> ```
>> 
>> This initialization seems to not needed, as HeapTupleSatisfiesVacuumHorizon() will always set a value to it.
> 
> I think this is a comment for a later patch in the set (you originally
> said it was from 0006), but I've changed dead_after to not be
> initialized like this.

My bad. This comment was actually for 0009. In v31, I see you have removed the initialization to dead_after.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2025-12-23 01:18  Melanie Plageman <[email protected]>
  parent: Chao Li <[email protected]>
  0 siblings, 2 replies; 143+ messages in thread

From: Melanie Plageman @ 2025-12-23 01:18 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Xuneng Zhou <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Dec 22, 2025 at 7:01 PM Chao Li <[email protected]> wrote:
>
> > On Dec 23, 2025, at 01:57, Melanie Plageman <[email protected]> wrote:
> >
> > My hesitation is that visibilitymap_set() is in a header file and
> > could be used by extensions/forks, etc. Adding more information by
> > changing a return value from void to non-void doesn't have any
> > negative effect on those potential callers. But taking away a return
> > value is more likely to affect them in a potentially negative way.
> >
> > However, I'm significantly changing the signature in this release, so
> > everybody that used it will have to change their code completely
> > anyway. Also, I just added a return value for visibilitymap_set() in
> > the previous release (18). Historically, it returned void. So, I've
> > gone with your suggestion.
>
> From a previous patch, I learned from Peter Eisentraut that “We don't care about ABI changes in major releases.”, see:

Right, it is totally okay to change function APIs in a major release.
My point was not that it wasn't allowed but that if people are getting
useful information returned from that function, or if we think we
might want that information again in the future, we should think twice
before changing it. But, in this case, I think we don't need to worry
about it.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-02 23:38  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-03-02 23:38 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Fri, Feb 20, 2026 at 12:59 PM Andres Freund <[email protected]> wrote:
>
> On 2026-01-28 18:16:10 -0500, Melanie Plageman wrote:
>
> > I could see an argument for moving identify_and_fix_vm_corruption()
> > out of the helper and into heap_page_prune_and_freeze() but then we'd
> > have to move visibilitymap_get_status() out too. And that takes away a
> > lot of the benefit of encapsulating all that logic.
>
> I was wondering about that option. Relatedly, I also was wondering if we ought
> to do identify_and_fix_vm_corruption() regardless of ->attempt_update_vm.

Attached v35 does this. I always pin the vmbuffer if we are going to
prune in heap_page_prune_opt(). In many cases, because it's saved in
the scan descriptor, it won't actually need to take a new pin. During
pruning, I check for VM corruption even if I am not considering
setting the VM.

> > Well, after this patch set, clearing the VM does happen before we emit
> > WAL for pruning.
>
> That I think is a substantial improvement, the current (i.e. before your
> series) placement really is pretty insane due to the guaranteed divergence it
> causes.
>
> I wonder if we actually should just force an FPI whenever we detect such
> corruption, that way it would reliably fixed on the standby as well.

Only problem is we would have to do an FPI of the VM page as well if
we wanted the corruption to be reliably fixed on the standby.

> > It wouldn't be hard to move the corruption fixups to the beginning of
> > heap_page_prune_and_freeze() in the new code structure.
>
> As identify_and_fix_vm_corruption() needs lpdead_items, I'm not sure that's
> true?
>
> I wonder if at least the warning for the "(PageIsAllVisible(heap_page) &&
> nlpdead_items > 0)" test should be moved to
> heap_prune_record_dead_or_unused(). That way the WARNING could include the
> offset number and it'd also work in the mark_unused_now case.
>
> Perhaps it also should trigger for RECENTLY_DEAD, INSERT_IN_PROGRESS,
> DELETE_IN_PROGRESS?
>
> At that point the !page_all_visible && vm_all_visible part could indeed be
> moved to the start of heap_page_prune_and_freeze()

I've done all this. There is heap page/VM corruption check at the
beginning of heap_page_prune_and_freeze() and then checking for
corruption during pruning in the previously covered case (lpdead
items) as well as the mark_unused_now case, and
RECENTLY_DEAD/INSERT_IN_PROGRESS/DELETE_IN_PROGRESS.

> > Would it be worth it? What benefit would we get? Do you just feel that it
> > should logically come first?
>
> One insanity is that right now we will process all frozen pages over and over
> due to he skip pages threshold, wasting a *lot* of CPU and memory bandwidth.
> It'd be quite defensible to just skip processing the page once we determined
> it's already all frozen.  But for that we'd probably want to do the
> "page_all_visible && vm_all_visible" check before returning...

I've added a fast path to bypass pruning/freezing when the page is
already all-visible. And I check for pg_all_visible && vm_all_visible
beforehand. The one downside this has is if there is a page marked
all-frozen but has dead tuples on it, we'll never get to fix that
corruption nor clean up the dead tuples. But the fast path kind of
seems worth it to me.

> > > Do we actually forsee a case where only one of HEAP_PAGE_PRUNE_FREEZE |
> > > HEAP_PAGE_PRUNE_UPDATE_VM would be set?
> >
> > Yes, when setting the VM on-access, it is too expensive to call
> > heap_prepare_freeze_tuple() on each tuple. I could work on trying to
> > optimize it, but it isn't currently viable.
>
> Is it too expensive to do so even when we already decided to do some pruning?
> I am not surprised it's too expensive when there's not even a dead tuple on
> the page.  But I am mildly surprised if it's too expensive to do when we'd WAL
> log anyway?

It's not really possible in the current code structure to only call
heap_prepare_freeze_tuple() when there are at least some prunable
tuples. We go through the line pointers and record them as prunable at
the same time we call heap_prepare_freeze_tuple(), so we won't know
until we've examined all line pointers that there are no prunable
tuples, at which point we will have called heap_prepare_freeze_tuple()
for every tuple.

> > I think using all_frozen_except_dead while maintaining
> > visibility_cutoff_xid (in heap_prune_record_unchanged_lp_normal()) has
> > the potential to be confusing, though. We'd need to keep updating
> > visibility_cutoff_xid when all_visible is false but
> > all_frozen_except_dead is true as well as when all_visible is true.
> > And because we don't care about all_visible_except_dead, it gets even
> > more confusing to make sure we are maintaining the right variables in
> > the right situations.
>
> I suspect we should just track all of the horizons/cutoffs all the time. This
> whole stuff about optimizing out a few conditional assignments complicates the
> code substantially and feels extremely error prone to me.

I've done this in v35. I posted the freeze horizon tracking patch
separately in [1] but it is in v35 as 0004. Tracking the newest live
xid is in 0009. This also always tracks all_visible for all callers
since I unconditionally pass the vmbuffer now. I still don't set the
VM if the query is modifying the relation, though.

> I probably complained about this before, and it's not this patch's fault, but
> PruneState->{all_visible,all_frozen} are imo confusingly named, due to
> sounding like they describe the current state, rather than the possible state
> after pruning.  It's not helped by this comment:
>
>          * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
>          * That's convenient for heap_page_prune_and_freeze() to use them to
>          * decide whether to opportunistically freeze the page or not.  The
>          * all_visible and all_frozen values ultimately used to set the VM are
>          * adjusted to include LP_DEAD items after we determine whether or not to
>          * opportunistically freeze.
>
> "all-visible ... are adjusted to include LP_DEAD" ... - just reading that it's
> hard to know what it means.

0003 does the rename.

> The first thing to improve pruning performance that I would do is to introduce
> a fastpath for pages that a) area already frozen b) do not have dead items (if
> we're not freezing). Iterating through HOT chains is far from cheap, and if
> all rows are live, there's not really a point in doing so.  This is
> particulary important for VACUUMs where we end up freezing a ton of pages that
> are already frozen, due to the silly skip_pages_threshold thing.

0007 adds a fast path.

> > +static TransactionId
> > +get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
> > +                              uint8 old_vmbits, uint8 new_vmbits,
> > +                              TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
> > +                              TransactionId visibility_cutoff_xid)
> > +{
> > +     TransactionId conflict_xid;
> > +
> > +     /*
> > +      * We can omit the snapshot conflict horizon if we are not pruning or
> > +      * freezing any tuples and are setting an already all-visible page
> > +      * all-frozen in the VM.
>
> Maybe mention when this can happen, because it's not immediately obvious.

I've added this to my TODO. I honestly can't think of a scenario where
it can happen. But I remember spending quite a bit of time thinking
about it on another occasion. The current code (in master) does
specifically account for this scenario, which is why I kept the logic,
but I'm not sure how it can happen.

I made all the other changes to specific comments you mentioned in
your mail but I won't bore you with itemization.

> >       if (do_set_vm)
> >               conflict_xid = visibility_cutoff_xid;
> >       else if (do_freeze)
> >               conflict_xid = frz_conflict_horizon;
> >       else
> >               conflict_xid = InvalidTransactionId;
>
> Could it be worth checking that if (do_set_vm && do_freeze) the
> frz_conflict_horizon won't "violated" by using visibility_cutoff_xid instead?

Yes, as you mentioned off-list, this wasn't right. New code is like this

TransactionId conflict_xid = InvalidTransactionId;
...
    if (do_set_vm)
        conflict_xid = newest_live_xid;
    if (do_freeze && TransactionIdFollows(newest_frozen_xid, conflict_xid))
        conflict_xid = newest_frozen_xid;

> > From 8d350868206456f631883a40a955dff480e408d3 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <[email protected]>
> > Date: Wed, 17 Dec 2025 16:51:05 -0500
> > Subject: [PATCH v34 09/14] Use GlobalVisState in vacuum to determine page
> >  level visibility
> >
> > [...]
> >
> > Because comparing a transaction ID against GlobalVisState is more
> > expensive than comparing against a single XID, we defer this check until
> > after scanning all tuples on the page.
>
> Curious, is this a precaution or was this a measurable bottleneck?

I did see GlobalVisTestXidMaybeRunning() in a profile I did when it
was still called for every HEAPTUPLE_LIVE tuple in
heap_prune_record_unchanged_lp_normal(), but I don't have the profile
or test case around anymore.

However, since I now unconditionally maintain the newest_live_xid,
moving GlobalVisTestXidMaybeRunning() back into
heap_prune_record_unchanged_lp_normal() wouldn't help us avoid any
work. It would just make the values of prstate.set_all_visible and
prstate.set_all_frozen more accurate sooner. But I don't think it's
worth the extra function call since set_all_frozen and set_all_visible
won't be totally "done" until after we decide whether or not to
opportunistically freeze anyway.

> > @@ -1077,6 +1078,24 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
> >       prune_freeze_plan(RelationGetRelid(params->relation),
> >                                         buffer, &prstate, off_loc);
> >
> > +     /*
> > +      * After processing all the live tuples on the page, if the newest xmin
> > +      * amongst them may be considered running by any snapshot, the page cannot
> > +      * be all-visible.
> > +      */
> > +     if (prstate.all_visible &&
> > +             TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
>
> Any reason to test IsNormal rather than just IsValid()?  There should never be
> a reason it's a valid but not "normal" xid, right?

Well the reason I did this was that the existing code in master
tracking visibility_cutoff_xid only advances it if
TransactionIdIsNormal(). I'm a bit confused about it too because it
seems like we would still want to do it for bootstrap mode xids. But I
see PageSetPrunable() only allows normal xids.

> > @@ -1794,28 +1812,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
> >                               }
> >
> >                               /*
> > -                              * The inserter definitely committed.  But is it old enough
> > -                              * that everyone sees it as committed?  A FrozenTransactionId
> > -                              * is seen as committed to everyone.  Otherwise, we check if
> > -                              * there is a snapshot that considers this xid to still be
> > -                              * running, and if so, we don't consider the page all-visible.
> > +                              * The inserter definitely committed. But we don't know if it
> > +                              * is old enough that everyone sees it as committed. Later,
> > +                              * after processing all the tuples on the page, we'll check if
> > +                              * there is any snapshot that still considers the newest xid
> > +                              * on the page to be running. If so, we don't consider the
> > +                              * page all-visible.
> >                                */
> >                               xmin = HeapTupleHeaderGetXmin(htup);
> >
> > -                             /*
> > -                              * For now always use prstate->cutoffs for this test, because
> > -                              * we only update 'all_visible' and 'all_frozen' when freezing
> > -                              * is requested. We could use GlobalVisTestIsRemovableXid
> > -                              * instead, if a non-freezing caller wanted to set the VM bit.
> > -                              */
> > -                             Assert(prstate->cutoffs);
> > -                             if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
> > -                             {
> > -                                     prstate->all_visible = false;
> > -                                     prstate->all_frozen = false;
> > -                                     break;
> > -                             }
> > -
> >                               /* Track newest xmin on page. */
> >                               if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
> >                                       TransactionIdIsNormal(xmin))
>
> Kinda wonder if this cod eshould be in something like
> heap_prune_record_freezable() or such, rather than be inside
> heap_prune_record_unchanged_lp_normal().

I played around with it, but it all felt a bit awkward. I wrote it
down for a future enhancement idea.

> > Subject: [PATCH v34 10/14] Unset all_visible sooner if not freezing
> >
> > In the prune/freeze path, we currently delay clearing all_visible and
> > all_frozen in the presence of dead items to allow opportunistic
> > freezing.
> >
> > However, if no freezing will be attempted, there’s no need to delay.
> > Clearing the flags earlier avoids extra bookkeeping in
> > heap_prune_record_unchanged_lp_normal(). This currently has no runtime
> > effect because all callers that consider setting the VM also prepare
> > freeze plans, but upcoming changes will allow on-access pruning to set
> > the VM without freezing. The extra bookkeeping was noticeable in a
> > profile of on-access VM setting.
>
> What workload was that?

It was a select * offset all query with a few fat tuples on each page
and none of them prunable. I'm planning on digging up the
case/creating a new one to see if it is reproducible. This was with an
older version of the code that had more conditionals as well. This
commit is actually dropped in v35 because I now always keep
newest_live_xid up-to-date (0009) which means unsetting
set_all_visible sooner has no benefit.

> Theoretically, even if we don't freeze, the page still may be all-visible or
> all frozen after the removal of dead items, no? Practically that won't happen,
> because we don't remove dead items in any of the relevant paths, but from the
> commit message and comments that's not entirely clear.

Yea, it's clearer with the commit dropped.

> > @@ -678,6 +678,12 @@ typedef struct EState
> >                                                                        * ExecDoInitialPruning() */
> >       const char *es_sourceText;      /* Source text from QueryDesc */
> >
> > +     /*
> > +      * RT indexes of relations modified by the query through a
> > +      * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
> > +      */
> > +     Bitmapset  *es_modified_relids;
> > +
>
> Other EState fields are initialized in CreateExecutorState, this isn't afaict?

Oops, yes. I based it on es_unpruned_relids which wasn't initialized
there either. I've added a commit (0013) to initialize a few EState
fields that weren't initialized in CreateExecutorState() as well.

> Wonder if it's worth adding a crosscheck somewhere, verifying that if a
> relation is modified, it's in es_modified_relids. Otherwise this could very
> well silently get out of date.

Done in v35 (0014).

> Also, there's some overlap between the informtion collected this way, and
> AcquireExecutorLocks(), ScanQueryForLocks(), which determine the needed lock
> modes via rte->rellockmode.

Those are in parser/planner, so it doesn't seem like a good fit. I
populate es_modified_relids in the executor.

I don't know exactly what the overlap would be between RTEs with an
exclusive rellockmode and es_modified_relids. It seems like you could
have RTEs which don't end up getting modified that have a lock level
that would have made you think that they would be modified.

But were you imagining a substitution or a cross-check?

> > From 8205b2d7da0c3ad3cbc5cead336ced677996b37d Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <[email protected]>
> > Date: Wed, 3 Dec 2025 15:12:18 -0500
> > Subject: [PATCH v34 12/14] Pass down information on table modification to scan
> >  node
>
> Perhaps worth splitting up, so the addition of the 0 flag is separate from the
> the read only hint aspect.

Done.

[1] https://www.postgresql.org/message-id/CAAKRu_bbaUV8OUjAfVa_iALgKnTSfB4gO3jnkfpcFgrxEpSGJQ%40mail.gma...


Attachments:

  [text/x-patch] v35-0001-Move-commonly-used-context-into-PruneState-and-s.patch (16.4K, 2-v35-0001-Move-commonly-used-context-into-PruneState-and-s.patch)
  download | inline diff:
From 7526e2a0e7d1a013cb9f4d95dff8a4feabd7035b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 26 Feb 2026 10:09:55 -0500
Subject: [PATCH v35 01/18] Move commonly used context into PruneState and
 simplify helpers

heap_page_prune_and_freeze() and many of its helpers use the heap
buffer, block number, and page. Other helpers took the heap page and
didn't use it. Initializing these values once during
prune_freeze_setup() simplifies the helpers' interfaces and avoids any
repeated calls to BufferGetBlockNumber() and BufferGetPage().

While updating PruneState, also reorganize its fields to make layout and
documentation more consistent
---
 src/backend/access/heap/pruneheap.c | 136 +++++++++++++++-------------
 1 file changed, 72 insertions(+), 64 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 632c2427952..3c5d33834fc 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -45,6 +45,16 @@ typedef struct
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
 	struct VacuumCutoffs *cutoffs;
+	Relation	relation;
+
+	/*
+	 * Keep the buffer, block, and page handy so that helpers needing to
+	 * access them don't need to make repeated calls to BufferGetBlockNumber()
+	 * and BufferGetPage().
+	 */
+	BlockNumber block;
+	Buffer		buffer;
+	Page		page;
 
 	/*-------------------------------------------------------
 	 * Fields describing what to do to the page
@@ -98,11 +108,19 @@ typedef struct
 	 */
 	int8		htsv[MaxHeapTuplesPerPage + 1];
 
-	/*
-	 * Freezing-related state.
+	/*-------------------------------------------------------
+	 * Working state for freezing
+	 *-------------------------------------------------------
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*
+	 * The snapshot conflict horizon used when freezing tuples. The final
+	 * snapshot conflict horizon for the record may be newer if pruning
+	 * removes newer transaction IDs.
+	 */
+	TransactionId frz_conflict_horizon;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -129,13 +147,6 @@ typedef struct
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
 
-	/*
-	 * The snapshot conflict horizon used when freezing tuples. The final
-	 * snapshot conflict horizon for the record may be newer if pruning
-	 * removes newer transaction IDs.
-	 */
-	TransactionId frz_conflict_horizon;
-
 	/*
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
@@ -162,14 +173,12 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
-static void prune_freeze_plan(Oid reloid, Buffer buffer,
-							  PruneState *prstate,
+static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
-											   HeapTuple tup,
-											   Buffer buffer);
+											   HeapTuple tup);
 static inline HTSV_Result htsv_get_valid_status(int status);
-static void heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
+static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
 static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
 static void heap_prune_record_redirect(PruneState *prstate,
@@ -181,15 +190,14 @@ static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber o
 											 bool was_normal);
 static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
 
-static void heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_unused(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum);
 static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
 
 static void page_verify_redirects(Page page);
 
-static bool heap_page_will_freeze(Relation relation, Buffer buffer,
-								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
+static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
 
 
@@ -342,6 +350,10 @@ prune_freeze_setup(PruneFreezeParams *params,
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
 	prstate->cutoffs = params->cutoffs;
+	prstate->relation = params->relation;
+	prstate->block = BufferGetBlockNumber(params->buffer);
+	prstate->buffer = params->buffer;
+	prstate->page = BufferGetPage(params->buffer);
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -455,16 +467,15 @@ prune_freeze_setup(PruneFreezeParams *params,
  * *off_loc is used for error callback and cleared before returning.
  */
 static void
-prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
-				  OffsetNumber *off_loc)
+prune_freeze_plan(PruneState *prstate, OffsetNumber *off_loc)
 {
-	Page		page = BufferGetPage(buffer);
-	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+	Page		page = prstate->page;
+	BlockNumber blockno = prstate->block;
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
 	OffsetNumber offnum;
 	HeapTupleData tup;
 
-	tup.t_tableOid = reloid;
+	tup.t_tableOid = RelationGetRelid(prstate->relation);
 
 	/*
 	 * Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -505,7 +516,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 		/* Nothing to do if slot doesn't contain a tuple */
 		if (!ItemIdIsUsed(itemid))
 		{
-			heap_prune_record_unchanged_lp_unused(page, prstate, offnum);
+			heap_prune_record_unchanged_lp_unused(prstate, offnum);
 			continue;
 		}
 
@@ -518,7 +529,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 			if (unlikely(prstate->mark_unused_now))
 				heap_prune_record_unused(prstate, offnum, false);
 			else
-				heap_prune_record_unchanged_lp_dead(page, prstate, offnum);
+				heap_prune_record_unchanged_lp_dead(prstate, offnum);
 			continue;
 		}
 
@@ -539,8 +550,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 		tup.t_len = ItemIdGetLength(itemid);
 		ItemPointerSet(&tup.t_self, blockno, offnum);
 
-		prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, &tup,
-															buffer);
+		prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, &tup);
 
 		if (!HeapTupleHeaderIsHeapOnly(htup))
 			prstate->root_items[prstate->nroot_items++] = offnum;
@@ -571,7 +581,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 		*off_loc = offnum;
 
 		/* Process this item or chain of items */
-		heap_prune_chain(page, blockno, maxoff, offnum, prstate);
+		heap_prune_chain(maxoff, offnum, prstate);
 	}
 
 	/*
@@ -627,7 +637,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 			}
 		}
 		else
-			heap_prune_record_unchanged_lp_normal(page, prstate, offnum);
+			heap_prune_record_unchanged_lp_normal(prstate, offnum);
 	}
 
 	/* We should now have processed every tuple exactly once  */
@@ -648,7 +658,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 
 /*
  * Decide whether to proceed with freezing according to the freeze plans
- * prepared for the given heap buffer. If freezing is chosen, this function
+ * prepared for the current heap buffer. If freezing is chosen, this function
  * performs several pre-freeze checks.
  *
  * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
@@ -660,8 +670,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
  * page, and false otherwise.
  */
 static bool
-heap_page_will_freeze(Relation relation, Buffer buffer,
-					  bool did_tuple_hint_fpi,
+heap_page_will_freeze(bool did_tuple_hint_fpi,
 					  bool do_prune,
 					  bool do_hint_prune,
 					  PruneState *prstate)
@@ -709,18 +718,19 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 			 * Freezing would make the page all-frozen.  Have already emitted
 			 * an FPI or will do so anyway?
 			 */
-			if (RelationNeedsWAL(relation))
+			if (RelationNeedsWAL(prstate->relation))
 			{
 				if (did_tuple_hint_fpi)
 					do_freeze = true;
 				else if (do_prune)
 				{
-					if (XLogCheckBufferNeedsBackup(buffer))
+					if (XLogCheckBufferNeedsBackup(prstate->buffer))
 						do_freeze = true;
 				}
 				else if (do_hint_prune)
 				{
-					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+					if (XLogHintBitIsNeeded() &&
+						XLogCheckBufferNeedsBackup(prstate->buffer))
 						do_freeze = true;
 				}
 			}
@@ -733,7 +743,7 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * Validate the tuples we will be freezing before entering the
 		 * critical section.
 		 */
-		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+		heap_pre_freeze_checks(prstate->buffer, prstate->frozen, prstate->nfrozen);
 
 		/*
 		 * Calculate what the snapshot conflict horizon should be for a record
@@ -822,8 +832,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
 {
-	Buffer		buffer = params->buffer;
-	Page		page = BufferGetPage(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -842,8 +850,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * Prepare queue of state changes to later be executed in a critical
 	 * section.
 	 */
-	prune_freeze_plan(RelationGetRelid(params->relation),
-					  buffer, &prstate, off_loc);
+	prune_freeze_plan(&prstate, off_loc);
 
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
@@ -861,15 +868,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * pd_prune_xid field or the page was marked full, we will update the hint
 	 * bit.
 	 */
-	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
-		PageIsFull(page);
+	do_hint_prune = ((PageHeader) prstate.page)->pd_prune_xid != prstate.new_prune_xid ||
+		PageIsFull(prstate.page);
 
 	/*
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
 	 */
-	do_freeze = heap_page_will_freeze(params->relation, buffer,
-									  did_tuple_hint_fpi,
+	do_freeze = heap_page_will_freeze(did_tuple_hint_fpi,
 									  do_prune,
 									  do_hint_prune,
 									  &prstate);
@@ -901,14 +907,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
 		 * XID of any soon-prunable tuple.
 		 */
-		((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
+		((PageHeader) prstate.page)->pd_prune_xid = prstate.new_prune_xid;
 
 		/*
 		 * Also clear the "page is full" flag, since there's no point in
 		 * repeating the prune/defrag process until something else happens to
 		 * the page.
 		 */
-		PageClearFull(page);
+		PageClearFull(prstate.page);
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
@@ -916,7 +922,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * the buffer dirty below.
 		 */
 		if (!do_freeze && !do_prune)
-			MarkBufferDirtyHint(buffer, true);
+			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
 	if (do_prune || do_freeze)
@@ -924,21 +930,21 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
 		{
-			heap_page_prune_execute(buffer, false,
+			heap_page_prune_execute(prstate.buffer, false,
 									prstate.redirected, prstate.nredirected,
 									prstate.nowdead, prstate.ndead,
 									prstate.nowunused, prstate.nunused);
 		}
 
 		if (do_freeze)
-			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		MarkBufferDirty(prstate.buffer);
 
 		/*
 		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
 		 */
-		if (RelationNeedsWAL(params->relation))
+		if (RelationNeedsWAL(prstate.relation))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -958,7 +964,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			else
 				conflict_xid = prstate.latest_xid_removed;
 
-			log_heap_prune_and_freeze(params->relation, buffer,
+			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
 									  InvalidBuffer,	/* vmbuffer */
 									  0,	/* vmflags */
 									  conflict_xid,
@@ -1018,12 +1024,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
  * Perform visibility checks for heap pruning.
  */
 static HTSV_Result
-heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
+heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup)
 {
 	HTSV_Result res;
 	TransactionId dead_after;
 
-	res = HeapTupleSatisfiesVacuumHorizon(tup, buffer, &dead_after);
+	res = HeapTupleSatisfiesVacuumHorizon(tup, prstate->buffer, &dead_after);
 
 	if (res != HEAPTUPLE_RECENTLY_DEAD)
 		return res;
@@ -1100,13 +1106,14 @@ htsv_get_valid_status(int status)
  * based on that outcome.
  */
 static void
-heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
-				 OffsetNumber rootoffnum, PruneState *prstate)
+heap_prune_chain(OffsetNumber maxoff, OffsetNumber rootoffnum,
+				 PruneState *prstate)
 {
 	TransactionId priorXmax = InvalidTransactionId;
 	ItemId		rootlp;
 	OffsetNumber offnum;
 	OffsetNumber chainitems[MaxHeapTuplesPerPage];
+	Page		page = prstate->page;
 
 	/*
 	 * After traversing the HOT chain, ndeadchain is the index in chainitems
@@ -1235,7 +1242,7 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
 		/*
 		 * Advance to next chain member.
 		 */
-		Assert(ItemPointerGetBlockNumber(&htup->t_ctid) == blockno);
+		Assert(ItemPointerGetBlockNumber(&htup->t_ctid) == prstate->block);
 		offnum = ItemPointerGetOffsetNumber(&htup->t_ctid);
 		priorXmax = HeapTupleHeaderGetUpdateXid(htup);
 	}
@@ -1270,7 +1277,7 @@ process_chain:
 			i++;
 		}
 		for (; i < nchain; i++)
-			heap_prune_record_unchanged_lp_normal(page, prstate, chainitems[i]);
+			heap_prune_record_unchanged_lp_normal(prstate, chainitems[i]);
 	}
 	else if (ndeadchain == nchain)
 	{
@@ -1296,7 +1303,7 @@ process_chain:
 
 		/* the rest of tuples in the chain are normal, unchanged tuples */
 		for (int i = ndeadchain; i < nchain; i++)
-			heap_prune_record_unchanged_lp_normal(page, prstate, chainitems[i]);
+			heap_prune_record_unchanged_lp_normal(prstate, chainitems[i]);
 	}
 }
 
@@ -1421,7 +1428,7 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_norm
  * Record an unused line pointer that is left unchanged.
  */
 static void
-heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_unused(PruneState *prstate, OffsetNumber offnum)
 {
 	Assert(!prstate->processed[offnum]);
 	prstate->processed[offnum] = true;
@@ -1432,9 +1439,10 @@ heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumb
  * update bookkeeping of tuple counts and page visibility.
  */
 static void
-heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
 	prstate->processed[offnum] = true;
@@ -1615,7 +1623,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
  * Record line pointer that was already LP_DEAD and is left unchanged.
  */
 static void
-heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 {
 	Assert(!prstate->processed[offnum]);
 	prstate->processed[offnum] = true;
-- 
2.43.0



  [text/x-patch] v35-0002-Add-PageGetPruneXid-helper.patch (1.9K, 3-v35-0002-Add-PageGetPruneXid-helper.patch)
  download | inline diff:
From aad49496321243eaab94d288da021c537b96f652 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 14:09:11 -0500
Subject: [PATCH v35 02/18] Add PageGetPruneXid helper

This is inline with other page header accessors. It improves readability
and avoids long lines.
---
 src/backend/access/heap/pruneheap.c | 4 ++--
 src/include/storage/bufpage.h       | 6 ++++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3c5d33834fc..1d61b336193 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -234,7 +234,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 * determining the appropriate horizon is a waste if there's no prune_xid
 	 * (i.e. no updates/deletes left potentially dead tuples around).
 	 */
-	prune_xid = ((PageHeader) page)->pd_prune_xid;
+	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
 		return;
 
@@ -868,7 +868,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * pd_prune_xid field or the page was marked full, we will update the hint
 	 * bit.
 	 */
-	do_hint_prune = ((PageHeader) prstate.page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_prune = PageGetPruneXid(prstate.page) != prstate.new_prune_xid ||
 		PageIsFull(prstate.page);
 
 	/*
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index ae3725b3b81..92a6bb9b0c0 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -441,6 +441,12 @@ PageClearAllVisible(Page page)
 	((PageHeader) page)->pd_flags &= ~PD_ALL_VISIBLE;
 }
 
+static inline TransactionId
+PageGetPruneXid(const PageData *page)
+{
+	return ((const PageHeaderData *) page)->pd_prune_xid;
+}
+
 /*
  * These two require "access/transam.h", so left as macros.
  */
-- 
2.43.0



  [text/x-patch] v35-0003-Rename-PruneState-all_visible-all_frozen.patch (13.7K, 4-v35-0003-Rename-PruneState-all_visible-all_frozen.patch)
  download | inline diff:
From 7038ae8d57ff2d5f63c2a306e34703a4b54c047a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sun, 1 Mar 2026 15:59:04 -0500
Subject: [PATCH v35 03/18] Rename PruneState->all_visible/all_frozen

to set_all_visible and set_all_frozen to clarify that this is the
proposed state of the all-visible and all-frozen bits for a heap page in
the visibility map, not the current state.

Author: Melanie Plageman <[email protected]>
Suggested-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c | 144 ++++++++++++++--------------
 1 file changed, 74 insertions(+), 70 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1d61b336193..fa5aa2a63f2 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -148,22 +148,24 @@ typedef struct
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page after pruning.
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
 	 *
 	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
 	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
+	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
-	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to freeze the page or not.  The
+	 * set_all_visible and set_all_frozen values returned to the caller are
+	 * adjusted to include LP_DEAD items after we determine whether to
+	 * opportunistically freeze.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
+	bool		set_all_visible;
+	bool		set_all_frozen;
 	TransactionId visibility_cutoff_xid;
 } PruneState;
 
@@ -419,22 +421,22 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * setting the VM bits.
 	 *
 	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
-	 * that after scanning the line pointers. We must correct all_visible and
-	 * all_frozen before we return them to the caller, so that the caller
-	 * doesn't set the VM bits incorrectly.
+	 * also use 'set_all_visible' and 'set_all_frozen' for our own
+	 * decision-making. If the whole page would become frozen, we consider
+	 * opportunistically freezing tuples.  We will not be able to freeze the
+	 * whole page if there are tuples present that are not visible to everyone
+	 * or if there are dead tuples which are not yet removable.  However, dead
+	 * tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing.  Because of that, we do
+	 * not immediately clear set_all_visible and set_all_frozen when we see
+	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
+	 * correct set_all_visible and set_all_frozen before we return them to the
+	 * caller, so that the caller doesn't set the VM bits incorrectly.
 	 */
 	if (prstate->attempt_freeze)
 	{
-		prstate->all_visible = true;
-		prstate->all_frozen = true;
+		prstate->set_all_visible = true;
+		prstate->set_all_frozen = true;
 	}
 	else
 	{
@@ -442,8 +444,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 		 * Initializing to false allows skipping the work to update them in
 		 * heap_prune_record_unchanged_lp_normal().
 		 */
-		prstate->all_visible = false;
-		prstate->all_frozen = false;
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
 	}
 
 	/*
@@ -683,8 +685,8 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	 */
 	if (!prstate->attempt_freeze)
 	{
-		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -710,9 +712,9 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 		 * anymore.  The opportunistic freeze heuristic must be improved;
 		 * however, for now, try to approximate the old logic.
 		 */
-		if (prstate->all_frozen && prstate->nfrozen > 0)
+		if (prstate->set_all_frozen && prstate->nfrozen > 0)
 		{
-			Assert(prstate->all_visible);
+			Assert(prstate->set_all_visible);
 
 			/*
 			 * Freezing would make the page all-frozen.  Have already emitted
@@ -752,7 +754,7 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 		 * in the VM once we're done with it. Otherwise, we generate a
 		 * conservative cutoff by stepping back from OldestXmin.
 		 */
-		if (prstate->all_frozen)
+		if (prstate->set_all_frozen)
 			prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
 		else
 		{
@@ -769,7 +771,7 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 		 */
 		Assert(!prstate->pagefrz.freeze_required);
 
-		prstate->all_frozen = false;
+		prstate->set_all_frozen = false;
 		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
 	}
 	else
@@ -804,11 +806,12 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
  * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
+ * presult->set_all_visible and presult->set_all_frozen after determining
+ * whether or not to opportunistically freeze, to indicate if the VM bits can
+ * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed, because at the moment only callers that also freeze
+ * need that information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -882,21 +885,21 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	/*
 	 * While scanning the line pointers, we did not clear
-	 * all_visible/all_frozen when encountering LP_DEAD items because we
-	 * wanted the decision whether or not to freeze the page to be unaffected
-	 * by the short-term presence of LP_DEAD items.  These LP_DEAD items are
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 * set_all_visible/set_all_frozen when encountering LP_DEAD items because
+	 * we wanted the decision whether or not to freeze the page to be
+	 * unaffected by the short-term presence of LP_DEAD items.  These LP_DEAD
+	 * items are effectively assumed to be LP_UNUSED items in the making.  It
+	 * doesn't matter which vacuum heap pass (initial pass or final pass) ends
+	 * up setting the page all-frozen, as long as the ongoing VACUUM does it.
 	 *
 	 * Now that we finished determining whether or not to freeze the page,
-	 * update all_visible and all_frozen so that they reflect the true state
-	 * of the page for setting PD_ALL_VISIBLE and VM bits.
+	 * update set_all_visible and set_all_frozen so that they reflect the true
+	 * state of the page for setting PD_ALL_VISIBLE and VM bits.
 	 */
 	if (prstate.lpdead_items > 0)
-		prstate.all_visible = prstate.all_frozen = false;
+		prstate.set_all_visible = prstate.set_all_frozen = false;
 
-	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -984,8 +987,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
+	presult->all_visible = prstate.set_all_visible;
+	presult->all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
 
 	/*
@@ -1365,9 +1368,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	prstate->ndead++;
 
 	/*
-	 * Deliberately delay unsetting all_visible and all_frozen until later
-	 * during pruning. Removable dead tuples shouldn't preclude freezing the
-	 * page.
+	 * Deliberately delay unsetting set_all_visible and set_all_frozen until
+	 * later during pruning. Removable dead tuples shouldn't preclude freezing
+	 * the page.
 	 */
 
 	/* Record the dead offset for vacuum */
@@ -1489,14 +1492,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->all_visible)
+			if (prstate->set_all_visible)
 			{
 				TransactionId xmin;
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
+					prstate->set_all_visible = false;
+					prstate->set_all_frozen = false;
 					break;
 				}
 
@@ -1511,15 +1514,16 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 
 				/*
 				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisTestIsRemovableXid
-				 * instead, if a non-freezing caller wanted to set the VM bit.
+				 * we only update 'set_all_visible' and 'set_all_frozen' when
+				 * freezing is requested. We could use
+				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
+				 * caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
+					prstate->set_all_visible = false;
+					prstate->set_all_frozen = false;
 					break;
 				}
 
@@ -1532,8 +1536,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
-			prstate->all_frozen = false;
+			prstate->set_all_visible = false;
+			prstate->set_all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1552,8 +1556,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
-			prstate->all_frozen = false;
+			prstate->set_all_visible = false;
+			prstate->set_all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1571,8 +1575,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
-			prstate->all_frozen = false;
+			prstate->set_all_visible = false;
+			prstate->set_all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
@@ -1614,7 +1618,7 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 		 * definitely cannot be set all-frozen in the visibility map later on.
 		 */
 		if (!totally_frozen)
-			prstate->all_frozen = false;
+			prstate->set_all_frozen = false;
 	}
 }
 
@@ -1637,10 +1641,10 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 	 * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
 	 * handled (handled here, or handled later on).
 	 *
-	 * Similarly, don't unset all_visible and all_frozen until later, at the
-	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
-	 * freeze the page after pruning.  As long as we unset it before updating
-	 * the visibility map, this will be correct.
+	 * Similarly, don't unset set_all_visible and set_all_frozen until later,
+	 * at the end of heap_page_prune_and_freeze().  This will allow us to
+	 * attempt to freeze the page after pruning.  As long as we unset it
+	 * before updating the visibility map, this will be correct.
 	 */
 
 	/* Record the dead offset for vacuum */
-- 
2.43.0



  [text/x-patch] v35-0004-Use-the-newest-to-be-frozen-xid-as-the-conflict-.patch (8.0K, 5-v35-0004-Use-the-newest-to-be-frozen-xid-as-the-conflict-.patch)
  download | inline diff:
From a3b91ab430e7af8b459c169181c1dc3f0f04c8bf Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 13:55:45 -0500
Subject: [PATCH v35 04/18] Use the newest to-be-frozen xid as the conflict
 horizon for freezing

Previously WAL records that froze tuples used OldestXmin as the snapshot
conflict horizon. However, OldestXmin is newer than the newest frozen
tuple's xid. By tracking the newest to-be-frozen xid and using it as the
snapshot conflict horizon instead, we end up with an older horizon that
will result in fewer query cancellations on the standby.
---
 src/backend/access/heap/heapam.c    | 16 +++++++++++
 src/backend/access/heap/pruneheap.c | 44 ++++++++---------------------
 src/include/access/heapam.h         |  8 ++++++
 3 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a231563f0df..76f94fdfa5b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6781,6 +6781,10 @@ heap_inplace_unlock(Relation relation,
  * NB: Caller should avoid needlessly calling heap_tuple_should_freeze when we
  * have already forced page-level freezing, since that might incur the same
  * SLRU buffer misses that we specifically intended to avoid by freezing.
+ *
+ * We won't update the FreezePageConflictXid because any lockers don't affect
+ * visibility on the standby, and we don't have to worry about the update XID
+ * because the only way it can be older than OldestXmin is if it aborted.
  */
 static TransactionId
 FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
@@ -7173,7 +7177,11 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 
 		/* Verify that xmin committed if and when freeze plan is executed */
 		if (freeze_xmin)
+		{
 			frz->checkflags |= HEAP_FREEZE_CHECK_XMIN_COMMITTED;
+			if (TransactionIdFollows(xid, pagefrz->FreezePageConflictXid))
+				pagefrz->FreezePageConflictXid = xid;
+		}
 	}
 
 	/*
@@ -7192,6 +7200,9 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 		 */
 		replace_xvac = pagefrz->freeze_required = true;
 
+		if (TransactionIdFollows(xid, pagefrz->FreezePageConflictXid))
+			pagefrz->FreezePageConflictXid = xid;
+
 		/* Will set replace_xvac flags in freeze plan below */
 	}
 
@@ -7316,7 +7327,11 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 		 * independent of this, since the lock is released at xact end.)
 		 */
 		if (freeze_xmax && !HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
+		{
 			frz->checkflags |= HEAP_FREEZE_CHECK_XMAX_ABORTED;
+			if (TransactionIdFollows(xid, pagefrz->FreezePageConflictXid))
+				pagefrz->FreezePageConflictXid = xid;
+		}
 	}
 	else if (!TransactionIdIsValid(xid))
 	{
@@ -7499,6 +7514,7 @@ heap_freeze_tuple(HeapTupleHeader tuple,
 	cutoffs.MultiXactCutoff = MultiXactCutoff;
 
 	pagefrz.freeze_required = true;
+	pagefrz.FreezePageConflictXid = InvalidTransactionId;
 	pagefrz.FreezePageRelfrozenXid = FreezeLimit;
 	pagefrz.FreezePageRelminMxid = MultiXactCutoff;
 	pagefrz.NoFreezePageRelfrozenXid = FreezeLimit;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fa5aa2a63f2..07868dbcc17 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -114,13 +114,6 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
-	/*
-	 * The snapshot conflict horizon used when freezing tuples. The final
-	 * snapshot conflict horizon for the record may be newer if pruning
-	 * removes newer transaction IDs.
-	 */
-	TransactionId frz_conflict_horizon;
-
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -377,6 +370,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/* initialize page freezing working state */
 	prstate->pagefrz.freeze_required = false;
+	prstate->pagefrz.FreezePageConflictXid = InvalidTransactionId;
 	if (prstate->attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
@@ -407,7 +401,6 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * PruneState.
 	 */
 	prstate->deadoffsets = presult->deadoffsets;
-	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
 	 * Vacuum may update the VM after we're done.  We can keep track of
@@ -746,22 +739,8 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 		 * critical section.
 		 */
 		heap_pre_freeze_checks(prstate->buffer, prstate->frozen, prstate->nfrozen);
-
-		/*
-		 * Calculate what the snapshot conflict horizon should be for a record
-		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
-		 * for conflicts when the whole page is eligible to become all-frozen
-		 * in the VM once we're done with it. Otherwise, we generate a
-		 * conservative cutoff by stepping back from OldestXmin.
-		 */
-		if (prstate->set_all_frozen)
-			prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
-		else
-		{
-			/* Avoids false conflicts when hot_standby_feedback in use */
-			prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
-			TransactionIdRetreat(prstate->frz_conflict_horizon);
-		}
+		Assert(TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid,
+											 prstate->cutoffs->OldestXmin));
 	}
 	else if (prstate->nfrozen > 0)
 	{
@@ -886,11 +865,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	/*
 	 * While scanning the line pointers, we did not clear
 	 * set_all_visible/set_all_frozen when encountering LP_DEAD items because
-	 * we wanted the decision whether or not to freeze the page to be
-	 * unaffected by the short-term presence of LP_DEAD items.  These LP_DEAD
-	 * items are effectively assumed to be LP_UNUSED items in the making.  It
-	 * doesn't matter which vacuum heap pass (initial pass or final pass) ends
-	 * up setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 * we wanted the decision whether or not to opportunistically freeze the
+	 * page to be unaffected by the short-term presence of LP_DEAD items.
+	 * These LP_DEAD items are effectively assumed to be LP_UNUSED items in
+	 * the making. It doesn't matter which vacuum heap pass (initial pass or
+	 * final pass) ends up setting the page all-frozen, as long as the ongoing
+	 * VACUUM does it.
 	 *
 	 * Now that we finished determining whether or not to freeze the page,
 	 * update set_all_visible and set_all_frozen so that they reflect the true
@@ -953,7 +933,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * The snapshotConflictHorizon for the whole record should be the
 			 * most conservative of all the horizons calculated for any of the
 			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
+			 * transactions on the standby older than the youngest xid of the
 			 * most recently removed tuple this record will prune will
 			 * conflict.  If this record will freeze tuples, any transactions
 			 * on the standby with xids older than the youngest tuple this
@@ -961,9 +941,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			TransactionId conflict_xid;
 
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
+			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
 									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
+				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
 			else
 				conflict_xid = prstate.latest_xid_removed;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3c0961ab36b..fae79b37f0d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -208,6 +208,14 @@ typedef struct HeapPageFreeze
 	TransactionId FreezePageRelfrozenXid;
 	MultiXactId FreezePageRelminMxid;
 
+	/*
+	 * The youngest XID that will be frozen or removed during freezing. It is
+	 * used to calculate the snapshot conflict horizon for a WAL record
+	 * freezing tuples. Because it is only used if we do end up freezing
+	 * tuples, there is no need for a "no freeze" version.
+	 */
+	TransactionId FreezePageConflictXid;
+
 	/*
 	 * "No freeze" NewRelfrozenXid/NewRelminMxid trackers.
 	 *
-- 
2.43.0



  [text/x-patch] v35-0005-Save-vmbuffer-in-heap-specific-scan-descriptors-.patch (5.9K, 6-v35-0005-Save-vmbuffer-in-heap-specific-scan-descriptors-.patch)
  download | inline diff:
From 09b9cc477d8d9b689888566b9d4dced5eefea208 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:23:57 -0500
Subject: [PATCH v35 05/18] Save vmbuffer in heap-specific scan descriptors for
 on-access pruning

Future commits will use the visibility map in on-access pruning to avoid
pruning when a page is all-visible, fix VM corruption, and set the VM if
the page is all-visible.

Saving the vmbuffer in the scan descriptor reduces the number of times
it would need to be pinned and unpinned, making the overhead of doing so
negligible.
---
 src/backend/access/heap/heapam.c         | 12 +++++++++++-
 src/backend/access/heap/heapam_handler.c | 12 ++++++++++--
 src/backend/access/heap/pruneheap.c      |  2 +-
 src/include/access/heapam.h              | 19 ++++++++++++++++---
 4 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 76f94fdfa5b..e19209f180d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1310,6 +1310,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1348,6 +1349,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1380,6 +1387,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3ff36f59bf8..47624194f93 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								&hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2533,7 +2541,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 07868dbcc17..5ce3e54a036 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -209,7 +209,7 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index fae79b37f0d..4e2e71be558 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. The current heap block's
+	 * corresponding page in the visibility map.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +122,14 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/* Current heap block's corresponding page in the visibility map */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -417,7 +429,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v35-0006-Fix-visibility-map-corruption-in-more-cases.patch (18.3K, 7-v35-0006-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From c6a1fa5c8319779b800f903e24d3f239e16c1cc1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v35 06/18] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.
---
 src/backend/access/heap/pruneheap.c  | 174 +++++++++++++++++++++++++--
 src/backend/access/heap/vacuumlazy.c |  89 +-------------
 src/include/access/heapam.h          |  12 ++
 3 files changed, 175 insertions(+), 100 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5ce3e54a036..fa470f663b7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -114,6 +114,21 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -168,6 +183,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -175,7 +191,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -207,6 +224,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not yet pinned and pruning is performed, vmbuffer will be
+ * pinned. If we find VM corruption during pruning, we will fix it.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -273,6 +293,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -280,14 +310,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -350,6 +373,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->vmbits = visibilitymap_get_status(prstate->relation,
+											   prstate->block,
+											   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -766,6 +795,90 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Helper to fix visibility-related corruption on a heap page and its
+ * corresponding VM page. An all-visible page cannot have dead items nor can
+ * it have tuples that are not visible to all running transactions. It clears
+ * the VM corruption as well as resetting the vmbits used during pruning.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and any dead items must have been discovered under that same lock.
+ * Although we do not hold a lock on the VM buffer, it is pinned, and the heap
+ * buffer is exclusively locked, ensuring that no other backend can update the
+ * VM bits corresponding to this heap page.
+ *
+ * heap_fix_vm_corruption() makes changes to the VM and, potentially, the heap
+ * page, but it does not need to be done in a critical section because
+ * clearing the VM is not WAL-logged.
+ */
+static void
+heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
+{
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	if (PageIsAllVisible(prstate->page))
+	{
+		/*
+		 * It's possible for the value returned by
+		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+		 * wrong for us to see tuples that appear to not be visible to
+		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
+		 * xmin value never moves backwards, but
+		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
+		 * returns a value that's unnecessarily small, so if we see that
+		 * contradiction it just means that the tuples that we think are not
+		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
+		 * is correct.
+		 *
+		 * However, there should never be LP_DEAD items, dead tuple versions,
+		 * or tuples inserted by an in-progress transaction on a page with
+		 * PD_ALL_VISIBLE set.
+		 */
+		if (prstate->lpdead_items > 0)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("LP_DEAD item found on page marked as all-visible"),
+					 errdetail("relation \"%s\", page %u, tuple %u",
+							   RelationGetRelationName(prstate->relation),
+							   prstate->block, offnum)));
+		}
+		else
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all found on page marked as all-visible"),
+					 errdetail("relation \"%s\", page %u, tuple %u",
+							   RelationGetRelationName(prstate->relation),
+							   prstate->block, offnum)));
+		}
+
+		/*
+		 * Mark the buffer dirty now in case we make no further changes and
+		 * therefore would not mark it dirty later.
+		 */
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	{
+		/*
+		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
+		 * the page-level bit is clear.  However, it's possible that the bit
+		 * got cleared after heap_vac_scan_next_block() was called, so we must
+		 * recheck with buffer lock before concluding that the VM is corrupt.
+		 */
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page %u in \"%s\" is not marked all-visible but visibility map bit is set",
+						prstate->block,
+						RelationGetRelationName(prstate->relation))));
+	}
+
+	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+						VISIBILITYMAP_VALID_BITS);
+	prstate->vmbits = 0;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -826,6 +939,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -970,6 +1087,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->all_visible = prstate.set_all_visible;
 	presult->all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->vmbits = prstate.vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1292,7 +1410,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1302,6 +1421,13 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be redirected */
@@ -1385,6 +1511,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be marked unused */
@@ -1524,7 +1659,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1539,6 +1675,10 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_fix_vm_corruption(prstate, offnum);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1563,7 +1703,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1629,6 +1770,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5b6f2441f6b..0a0aa8e5a9e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -424,11 +424,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1963,81 +1958,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2069,6 +1989,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2178,18 +2099,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 	Assert(!presult.all_frozen || presult.all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4e2e71be558..9db92c7db8a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -258,6 +258,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -319,6 +325,12 @@ typedef struct PruneFreezeResult
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
+	 * pruning. It is cleared if VM corruption is found and corrected.
+	 */
+	uint8		vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v35-0007-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (4.3K, 8-v35-0007-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From 7e8ea684a4c6ee5d4b7169ec3195be75e76172e9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v35 07/18] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, we can exit early.
---
 src/backend/access/heap/pruneheap.c | 73 +++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fa470f663b7..73db45f8dfd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -184,6 +184,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
 static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
+static void heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -880,6 +881,66 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 	prstate->vmbits = 0;
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling heap_page_bypass_prune_freeze(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	presult->vmbits = prstate->vmbits;
+
+	if (!PageIsEmpty(page))
+		presult->hastup = true;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -943,6 +1004,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can exit early. Do this after fixing any
+	 * discrepancy between the page-level visibility hint and the VM.
+	 */
+	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	{
+		heap_page_bypass_prune_freeze(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
-- 
2.43.0



  [text/x-patch] v35-0008-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.4K, 9-v35-0008-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From f271209e3feb75f79e94b83c3d564e5d14d1b9bf Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v35 08/18] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page;
if it is globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid. This approach will result in examining more tuple
xmins than before; however, the additional cost should not be
significant. And doing so will enable us to set the visibility map on
access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 37 +++++++--------
 src/backend/access/heap/vacuumlazy.c        | 51 +++++++++++++--------
 src/include/access/heapam.h                 |  2 +
 4 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 75ae268d753..aee88947393 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1060,6 +1060,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 73db45f8dfd..7b72804a3e5 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1024,6 +1024,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest,
+									 prstate.visibility_cutoff_xid))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1692,29 +1703,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0a0aa8e5a9e..6c7807d5bd3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -460,13 +460,13 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2053,13 +2053,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
 	}
 #endif
 
@@ -2815,7 +2812,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3576,14 +3573,14 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3604,7 +3601,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3623,7 +3620,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3704,7 +3701,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3713,16 +3710,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3751,6 +3749,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9db92c7db8a..e401dd52e25 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -474,6 +474,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v35-0009-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (14.8K, 10-v35-0009-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From f70d52103e8f665de92bd531ff3a261b0142d20d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v35 09/18] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Earlier version reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 127 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++----
 2 files changed, 65 insertions(+), 92 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7b72804a3e5..dd731f64bc6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -160,11 +163,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -174,7 +172,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /* Local functions */
@@ -433,53 +430,35 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Other callers must initialize prstate.all_frozen to false,
+	 * since we will not call heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * We only consider opportunistic freezing if the page would become
+	 * all-frozen, or if it would be all-frozen except for dead tuples that
+	 * VACUUM will remove.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear
+	 * set_all_visible and set_all_frozen when we encounter LP_DEAD items.
+	 * Instead, we correct them after deciding whether to freeze, but before
+	 * updating the VM, to avoid setting the VM bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze ? true : false;
 }
 
 /*
@@ -709,7 +688,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -962,9 +940,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1030,9 +1007,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidMaybeRunning(prstate.vistest,
-									 prstate.visibility_cutoff_xid))
+									 prstate.newest_live_xid))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	/*
@@ -1184,7 +1161,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1644,6 +1621,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1691,32 +1669,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6c7807d5bd3..b5370ec26da 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -462,7 +462,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -470,7 +470,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2788,7 +2788,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2814,14 +2814,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2862,7 +2862,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3575,7 +3575,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 
@@ -3583,7 +3583,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3606,7 +3606,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3624,7 +3624,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3634,7 +3634,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3723,9 +3723,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3755,8 +3755,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *newest_live_xid))
 	{
 		all_visible = false;
 		*all_frozen = false;
-- 
2.43.0



  [text/x-patch] v35-0010-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (27.3K, 11-v35-0010-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 479ab7c11c1e48c938934706acf21cff460297c0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v35 10/18] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 321 ++++++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c | 107 +--------
 src/include/access/heapam.h          |  37 ++-
 3 files changed, 266 insertions(+), 199 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dd731f64bc6..d41e1c6fce4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
-	uint8		vmbits;
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -157,21 +176,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /* Local functions */
@@ -209,6 +213,12 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  uint8 old_vmbits, uint8 new_vmbits,
+									  TransactionId latest_xid_removed,
+									  TransactionId newest_frozen_xid,
+									  TransactionId newest_live_xid);
 
 
 /*
@@ -373,9 +383,10 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
-	prstate->vmbits = visibilitymap_get_status(prstate->relation,
-											   prstate->block,
-											   &prstate->vmbuffer);
+	prstate->new_vmbits = 0;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -445,7 +456,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/*
 	 * Currently, only VACUUM performs freezing, but other callers may in the
-	 * future. Other callers must initialize prstate.all_frozen to false,
+	 * future. Other callers must initialize prstate.set_all_frozen to false,
 	 * since we will not call heap_prepare_freeze_tuple() for each tuple.
 	 *
 	 * We only consider opportunistic freezing if the page would become
@@ -774,6 +785,66 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 uint8 old_vmbits, uint8 new_vmbits,
+				 TransactionId latest_xid_removed,
+				 TransactionId newest_frozen_xid,
+				 TransactionId newest_live_xid)
+{
+	TransactionId conflict_xid = InvalidTransactionId;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be seen as frozen by all MVCC snapshots on the standby (any
+	 * conflict would ahve been handled in reaction to the WAL record freezing
+	 * those tuples).
+	 */
+	if (!do_prune &&
+		!do_freeze &&
+		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) &&
+		(new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshot conflict horizon for the whole record should be the most
+	 * conservative (newest) of all the horizons calculated for any of the
+	 * possible modifications. If this record will prune tuples, any queries
+	 * on the standby with xmin older than the youngest XID of the most
+	 * recently removed tuple this record will prune will conflict.  If this
+	 * record will freeze tuples, any queries on the standby with xmin older
+	 * than the youngest tuple this record will freeze will conflict.
+	 *
+	 * If we are setting the VM, the conflict horizon is almost always the
+	 * newest live XID, except in the situation described above.
+	 *
+	 * By picking the newest of all of those, we can ensure that all changes
+	 * in the record have been taken into account.
+	 */
+	if (do_set_vm)
+		conflict_xid = newest_live_xid;
+	if (do_freeze && TransactionIdFollows(newest_frozen_xid, conflict_xid))
+		conflict_xid = newest_frozen_xid;
+
+	/*
+	 * If we are removing tuples with a younger XID than our so far calculated
+	 * conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+	{
+		Assert(do_prune);
+		conflict_xid = latest_xid_removed;
+	}
+
+	return conflict_xid;
+}
+
 /*
  * Helper to fix visibility-related corruption on a heap page and its
  * corresponding VM page. An all-visible page cannot have dead items nor can
@@ -839,7 +910,7 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 		PageClearAllVisible(prstate->page);
 		MarkBufferDirtyHint(prstate->buffer, true);
 	}
-	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	else if (prstate->old_vmbits & VISIBILITYMAP_VALID_BITS)
 	{
 		/*
 		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
@@ -856,7 +927,43 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 
 	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
 						VISIBILITYMAP_VALID_BITS);
-	prstate->vmbits = 0;
+	prstate->old_vmbits = 0;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * consider setting the VM.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
 }
 
 /*
@@ -885,8 +992,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
 	Page		page = prstate->page;
 
-	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
 			!prstate->attempt_freeze));
 
 	/* We'll fill in presult for the caller */
@@ -913,15 +1020,14 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 		MarkBufferDirtyHint(prstate->buffer, true);
 	}
 
-	presult->vmbits = prstate->vmbits;
-
 	if (!PageIsEmpty(page))
 		presult->hastup = true;
 }
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -936,12 +1042,10 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also is passed and if the page
+ * is found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -969,15 +1073,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
-	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
@@ -986,8 +1092,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * is not being attempted, we can exit early. Do this after fixing any
 	 * discrepancy between the page-level visibility hint and the VM.
 	 */
-	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	if (prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
 	{
 		heap_page_bypass_prune_freeze(&prstate, presult);
 		return;
@@ -1058,6 +1164,25 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									prstate.old_vmbits, prstate.new_vmbits,
+									prstate.latest_xid_removed,
+									prstate.pagefrz.FreezePageConflictXid,
+									prstate.newest_live_xid);
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1079,14 +1204,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1100,6 +1228,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1107,29 +1256,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xid of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1139,33 +1271,64 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.set_all_visible;
-	presult->all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->vmbits = prstate.vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->new_all_visible_pages = 1;
+			if (prstate.set_all_frozen)
+				presult->new_all_visible_frozen_pages = 1;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->new_all_frozen_pages = 1;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b5370ec26da..4678e0b9c26 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -458,13 +458,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
@@ -1995,8 +1988,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2037,29 +2028,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.all_frozen == debug_all_frozen);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2080,6 +2048,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	vacrel->new_all_visible_pages += presult.new_all_visible_pages;
+	vacrel->new_all_visible_all_frozen_pages += presult.new_all_visible_frozen_pages;
+	vacrel->new_all_frozen_pages += presult.new_all_frozen_pages;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.new_all_visible_frozen_pages > 0 ||
+		presult.new_all_frozen_pages > 0;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2093,71 +2069,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
-	if (!presult.all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3571,7 +3482,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e401dd52e25..7ef4cbbfb1e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -260,7 +260,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -276,8 +277,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -311,25 +311,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
-	 * pruning. It is cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		vmbits;
+	BlockNumber new_all_visible_pages;
+	BlockNumber new_all_visible_frozen_pages;
+	BlockNumber new_all_frozen_pages;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -466,7 +453,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
-
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v35-0011-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 12-v35-0011-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 410c0e06c85c4d686f114635b0044549dc22eceb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v35 11/18] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4678e0b9c26..68fa77b5318 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1902,9 +1902,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1922,13 +1925,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v35-0012-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.0K, 13-v35-0012-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 44626ffe27eddbd1dea7851b10079c150069faf7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v35 12/18] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 110 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 372 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 1a9e7bea5d2..bce767d7b71 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e19209f180d..2f9ef87463e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8894,50 +8894,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 6d39a5fff7c..df89f93edb4 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1367,9 +1230,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d41e1c6fce4..b66d49f4d60 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1245,8 +1245,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 68fa77b5318..ef607945a93 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1925,11 +1925,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2793,9 +2793,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 3047bd46def..fc74e39e069 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,112 +219,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -341,9 +239,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index d83afbfb9d6..afacc1b8e0d 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -476,12 +476,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index a0166c5b410..001afb037f3 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 77e3c04144e..f5cbcf084a4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4356,7 +4356,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v35-0013-Initialize-missing-fields-in-CreateExecutorState.patch (924B, 14-v35-0013-Initialize-missing-fields-in-CreateExecutorState.patch)
  download | inline diff:
From f24da3eaa6c3587bb0621817b78c148af0393349 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sun, 1 Mar 2026 16:48:19 -0500
Subject: [PATCH v35 13/18] Initialize missing fields in CreateExecutorState()

d47cbf474ecbd449a4 forgot to initialize a few fields it introduced in
the EState, so do that now.
---
 src/backend/executor/execUtils.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index a7955e476f9..cd4d5452cfb 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,9 @@ CreateExecutorState(void)
 	estate->es_rteperminfos = NIL;
 	estate->es_plannedstmt = NULL;
 	estate->es_part_prune_infos = NIL;
+	estate->es_part_prune_states = NIL;
+	estate->es_part_prune_results = NIL;
+	estate->es_unpruned_relids = NULL;
 
 	estate->es_junkFilter = NULL;
 
-- 
2.43.0



  [text/x-patch] v35-0014-Track-which-relations-are-modified-by-a-query.patch (5.4K, 15-v35-0014-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 382cdd7f98291e00e0fe11c53a32e2b64396fd8e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v35 14/18] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/executor/execMain.c  | 13 +++++++++++++
 src/backend/executor/execUtils.c | 32 ++++++++++++++++++++++++++++++++
 src/include/executor/executor.h  |  3 +++
 src/include/nodes/execnodes.h    |  6 ++++++
 4 files changed, 54 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bfd3ebc601e..6f51b82a364 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -920,6 +920,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation may be modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
@@ -990,6 +994,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
@@ -3027,6 +3035,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_range_table_size = parentestate->es_range_table_size;
 	rcestate->es_relations = parentestate->es_relations;
 	rcestate->es_rowmarks = parentestate->es_rowmarks;
+	rcestate->es_modified_relids = parentestate->es_modified_relids;
 	rcestate->es_rteperminfos = parentestate->es_rteperminfos;
 	rcestate->es_plannedstmt = parentestate->es_plannedstmt;
 	rcestate->es_junkFilter = parentestate->es_junkFilter;
@@ -3165,6 +3174,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 */
 	epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(rcestate);
+#endif
+
 	MemoryContextSwitchTo(oldcontext);
 }
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cd4d5452cfb..b4e95644404 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -123,6 +123,8 @@ CreateExecutorState(void)
 	estate->es_part_prune_results = NIL;
 	estate->es_unpruned_relids = NULL;
 
+	estate->es_modified_relids = NULL;
+
 	estate->es_junkFilter = NULL;
 
 	estate->es_output_cid = (CommandId) 0;
@@ -871,6 +873,34 @@ ExecGetRangeTableRelation(EState *estate, Index rti, bool isResultRel)
 	return rel;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Assert that es_modified_relids includes all potentially modified RT
+ * indexes.
+ */
+void
+CrossCheckModifiedRelids(EState *estate)
+{
+	Bitmapset  *expected = NULL;
+	ListCell   *lc;
+	Index		rti;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = lfirst_node(ResultRelInfo, lc);
+
+		expected = bms_add_member(expected, rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (rti = 1; rti <= estate->es_range_table_size; rti++)
+			if (estate->es_rowmarks[rti - 1] != NULL)
+				expected = bms_add_member(expected, rti);
+	}
+	Assert(bms_is_subset(expected, estate->es_modified_relids));
+}
+#endif
+
 /*
  * ExecInitResultRelation
  *		Open relation given by the passed-in RT index and fill its
@@ -896,6 +926,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..05f032baeaa 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -703,6 +703,9 @@ extern Relation ExecGetRangeTableRelation(EState *estate, Index rti,
 										  bool isResultRel);
 extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 								   Index rti);
+#ifdef USE_ASSERT_CHECKING
+extern void CrossCheckModifiedRelids(EState *estate);
+#endif
 
 extern int	executor_errposition(EState *estate, int location);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63c067d5aae..610385df12b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -679,6 +679,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query through a
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v35-0015-Make-begin_scan-functions-take-a-flags-argument.patch (21.2K, 16-v35-0015-Make-begin_scan-functions-take-a-flags-argument.patch)
  download | inline diff:
From 61ce0d481c14b6203efdb7fa77949e777505d613 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:17 -0500
Subject: [PATCH v35 15/18] Make begin_scan() functions take a flags argument

This lets us pass more information from the executor to use when
building the scan descriptor. A future commit will use this to tell the
scan descriptor whether or not its relation is read-only in the current
query.
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  6 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 ++++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 ++++----
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 ++++----
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      |  4 ++--
 src/backend/executor/nodeSeqscan.c        |  6 +++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/tableam.h              | 17 +++++++++--------
 22 files changed, 50 insertions(+), 46 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 9cd563fd0c3..eea24eb7116 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index ee9b6106922..977308f7282 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2060,7 +2060,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 47624194f93..ebe2e87a28b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,7 +79,7 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
@@ -761,7 +761,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -770,7 +770,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1fe7ffb2487 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +716,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 43f64a0e721..1827208396c 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index fd9d4087b5a..cc486e66793 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1926,7 +1926,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..b3aeee36ce6 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9ceeff6d99e..c5cbc5b4e1f 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1158,7 +1158,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b04b0dbd2a0..654cc7db175 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6388,7 +6388,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13765,7 +13765,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22623,7 +22623,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23087,7 +23087,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cb3e4f67ea1 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..5b8ca1abf62 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -383,7 +383,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +602,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +666,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index c68c26cbf38..106bcd3301c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -107,7 +107,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c2d09374517..cf4d9a4f832 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a616abff04c..a7af2f6628a 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -111,7 +111,7 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index af3c788ce8b..d9d7ec0516a 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -374,7 +374,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -407,5 +407,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index dd7e11c0ca5..3da2db74e88 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7186,7 +7186,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 4c0429cc613..3934fa44793 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,7 +156,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..e881e4f82a0 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -420,7 +420,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -894,9 +894,9 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
@@ -939,9 +939,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
 }
@@ -1139,7 +1139,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1175,7 +1176,7 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
@@ -1185,7 +1186,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v35-0016-Pass-down-information-on-table-modification-to-s.patch (8.0K, 17-v35-0016-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 1b41b0a89323c45652965d2e11afd729bdb2c1c7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v35 16/18] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, and bitmap
table scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/access/heap/heapam_handler.c  |  1 +
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  9 +++++++-
 src/backend/executor/nodeIndexscan.c      | 18 ++++++++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              |  2 ++
 7 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index ebe2e87a28b..3a8eb9d8b61 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -86,6 +86,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 106bcd3301c..1017676fce0 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -103,11 +103,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index cf4d9a4f832..2fe724a323f 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -84,6 +84,12 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -94,7 +100,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys, 0);
+								   node->ioss_NumOrderByKeys,
+								   flags);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a7af2f6628a..8730dab7469 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -198,6 +205,12 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -207,7 +220,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index d9d7ec0516a..336354922a2 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 7ef4cbbfb1e..c20218f8190 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -130,6 +130,12 @@ typedef struct IndexFetchHeapData
 
 	/* Current heap block's corresponding page in the visibility map */
 	Buffer		xs_vmbuffer;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e881e4f82a0..599011ba567 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
-- 
2.43.0



  [text/x-patch] v35-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch (9.9K, 18-v35-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 88a86dbfc54db38c890718d74419d94f15dade18 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v35 17/18] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              |  3 +-
 src/backend/access/heap/heapam_handler.c      |  6 ++-
 src/backend/access/heap/pruneheap.c           | 41 +++++++++++++++----
 src/backend/access/heap/vacuumlazy.c          |  2 +-
 src/include/access/heapam.h                   | 12 ++++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 6 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2f9ef87463e..5539bb8c10b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3a8eb9d8b61..eb5a1b7bd21 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -147,7 +147,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								hscan->modifies_base_rel);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2542,7 +2543,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b66d49f4d60..fc2ddcb5ab4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -213,7 +215,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
 									  uint8 old_vmbits, uint8 new_vmbits,
 									  TransactionId latest_xid_removed,
@@ -237,7 +240,8 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
  * pinned. If we find VM corruption during pruning, we will fix it.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -319,6 +323,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = 0;
+			if (rel_read_only)
+				params.options = HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -375,6 +381,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -937,21 +944,37 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * consider setting the VM.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1166,7 +1189,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ef607945a93..ab76800b4df 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2007,7 +2007,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c20218f8190..0a3e3df9b2d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -96,7 +97,8 @@ typedef struct HeapScanDescData
 
 	/*
 	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
+	 * corresponding page in the visibility map. If the relation is not
+	 * modified by the query, on-access pruning may set the VM.
 	 */
 	Buffer		rs_vmbuffer;
 
@@ -128,7 +130,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 
 	/*
@@ -435,7 +441,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v35-0018-Set-pd_prune_xid-on-insert.patch (9.3K, 19-v35-0018-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 815a2d10ebc6f672be5508a0c4a98ff866d0d71b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v35 18/18] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run and set the VM
all-visible after a page is filled with newly inserted tuples the first
time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam.c              | 31 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 17 +++++++++-
 src/backend/access/heap/pruneheap.c           | 14 ++++-----
 .../modules/index/expected/killtuples.out     |  8 ++---
 4 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 5539bb8c10b..bb124bc767b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,29 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2240,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2604,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index df89f93edb4..edd5c946c6a 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fc2ddcb5ab4..72a1c311bd0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1904,16 +1904,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_fix_vm_corruption(prstate, offnum);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..700144d6783 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -329,7 +329,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-03 00:04  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2026-03-03 00:04 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Fri, Feb 20, 2026 at 4:34 PM Andres Freund <[email protected]> wrote:
>
> On 2026-01-28 18:16:10 -0500, Melanie Plageman wrote:
> > Subject: [PATCH v34 13/14] Allow on-access pruning to set pages all-visible
> >
> > Many queries do not modify the underlying relation. For such queries, if
> > on-access pruning occurs during the scan, we can check whether the page
> > has become all-visible and update the visibility map accordingly.
> > Previously, only vacuum and COPY FREEZE marked pages as all-visible or
> > all-frozen.
> >
> > This commit implements on-access VM setting for sequential scans as well
> > as for the underlying heap relation in index scans and bitmap heap
> > scans.
>
> For evaluating this, did you build anything that evaluates the frequency of
> this succeeding, causing unnecessary un-all-visibling etc during benchmarks?

I didn't develop a specific micro-benchmark for this, but I did run
some generic pgbenches (which does a single tuple update on accounts
followed by a select) because I thought there would be a good amount
of un-all-visibling there. I didn't gather stats to confirm though and
who knows with a random data distribution (IIRC it was a relatively
small working set, but still). I can develop something more targeted,
though.

> > @@ -631,7 +632,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
> >       /*
> >        * Prune and repair fragmentation for the whole page, if possible.
> >        */
> > -     heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
> > +     if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
> > +             vmbuffer = &scan->rs_vmbuffer;
> > +     heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
>
> I don't love that the signalling to heap_page_prune_opt() about this is by
> passing vmbuffer or NULL.

v35 is more explicit and heap_page_prune_opt() has a rel_read_only flag.

> We clearly don't want to actually freeze rows if we're doing an update and
> might just update the rows again. But it's less clear to me that, if we are
> pruning dead row versions *and* the page is already all-visible after that
> (say because only HOT versions were removed), we shouldn't mark the page as
> such?

If we're doing an update and the new tuple fits on the same page, then
the page will not be all-visible by the time the update is over,
right? And if the new tuple doesn't fit on the same page as the old
tuple, then while it would be nice to mark the old page as
all-visible, don't we on-access prune the page before actually
updating the tuple? Like we are scanning in the old page to update it
and on-access prune then to make space for it and then we make the
page modification.

> > @@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
> >                               .cutoffs = NULL,
> >                       };
> >
> > +                     if (vmbuffer)
> > +                     {
> > +                             visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
> > +                             params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
> > +                             params.vmbuffer = *vmbuffer;
>
> Why do we pin the buffer at this time, rather than deferring that until we
> actually need it?  I guess we just always will access it, but that doesn't
> seem like it's inherent (c.f. my earlier points about a faster exit when
> looking at an already all-frozen page or such).

We would need to pin the VM to see if it is all-frozen to exit early.
For the on-access case, since we won't freeze, we could rely on
PD_ALL_VISIBLE to exit early, but that means we wouldn't be able to
identify and fix PD_ALL_VISIBLE/VM-all-visible mismatches.

> It's not clear to me why we are pinning the page in lazy_scan_heap(), before
> it's clear that we need it, either.  But there the cost is often very low,
> because we have a lot of sequential accesses.  But here we might be called
> from an index scan, with very little locality of access.

Now that, as of v35, we check for VM corruption unconditionally at the
start of heap_page_prune_and_freeze() and check the VM to potentially
exit early, there's no benefit in deferring pinning the VM in either
vacuum or on-access.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-03 07:32  Chao Li <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Chao Li @ 2026-03-03 07:32 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>



> On Mar 3, 2026, at 07:38, Melanie Plageman <[email protected]> wrote:
> 
> On Fri, Feb 20, 2026 at 12:59 PM Andres Freund <[email protected]> wrote:
>> 
>> On 2026-01-28 18:16:10 -0500, Melanie Plageman wrote:
>> 
>>> I could see an argument for moving identify_and_fix_vm_corruption()
>>> out of the helper and into heap_page_prune_and_freeze() but then we'd
>>> have to move visibilitymap_get_status() out too. And that takes away a
>>> lot of the benefit of encapsulating all that logic.
>> 
>> I was wondering about that option. Relatedly, I also was wondering if we ought
>> to do identify_and_fix_vm_corruption() regardless of ->attempt_update_vm.
> 
> Attached v35 does this. I always pin the vmbuffer if we are going to
> prune in heap_page_prune_opt(). In many cases, because it's saved in
> the scan descriptor, it won't actually need to take a new pin. During
> pruning, I check for VM corruption even if I am not considering
> setting the VM.
> 
>>> Well, after this patch set, clearing the VM does happen before we emit
>>> WAL for pruning.
>> 
>> That I think is a substantial improvement, the current (i.e. before your
>> series) placement really is pretty insane due to the guaranteed divergence it
>> causes.
>> 
>> I wonder if we actually should just force an FPI whenever we detect such
>> corruption, that way it would reliably fixed on the standby as well.
> 
> Only problem is we would have to do an FPI of the VM page as well if
> we wanted the corruption to be reliably fixed on the standby.
> 
>>> It wouldn't be hard to move the corruption fixups to the beginning of
>>> heap_page_prune_and_freeze() in the new code structure.
>> 
>> As identify_and_fix_vm_corruption() needs lpdead_items, I'm not sure that's
>> true?
>> 
>> I wonder if at least the warning for the "(PageIsAllVisible(heap_page) &&
>> nlpdead_items > 0)" test should be moved to
>> heap_prune_record_dead_or_unused(). That way the WARNING could include the
>> offset number and it'd also work in the mark_unused_now case.
>> 
>> Perhaps it also should trigger for RECENTLY_DEAD, INSERT_IN_PROGRESS,
>> DELETE_IN_PROGRESS?
>> 
>> At that point the !page_all_visible && vm_all_visible part could indeed be
>> moved to the start of heap_page_prune_and_freeze()
> 
> I've done all this. There is heap page/VM corruption check at the
> beginning of heap_page_prune_and_freeze() and then checking for
> corruption during pruning in the previously covered case (lpdead
> items) as well as the mark_unused_now case, and
> RECENTLY_DEAD/INSERT_IN_PROGRESS/DELETE_IN_PROGRESS.
> 
>>> Would it be worth it? What benefit would we get? Do you just feel that it
>>> should logically come first?
>> 
>> One insanity is that right now we will process all frozen pages over and over
>> due to he skip pages threshold, wasting a *lot* of CPU and memory bandwidth.
>> It'd be quite defensible to just skip processing the page once we determined
>> it's already all frozen.  But for that we'd probably want to do the
>> "page_all_visible && vm_all_visible" check before returning...
> 
> I've added a fast path to bypass pruning/freezing when the page is
> already all-visible. And I check for pg_all_visible && vm_all_visible
> beforehand. The one downside this has is if there is a page marked
> all-frozen but has dead tuples on it, we'll never get to fix that
> corruption nor clean up the dead tuples. But the fast path kind of
> seems worth it to me.
> 
>>>> Do we actually forsee a case where only one of HEAP_PAGE_PRUNE_FREEZE |
>>>> HEAP_PAGE_PRUNE_UPDATE_VM would be set?
>>> 
>>> Yes, when setting the VM on-access, it is too expensive to call
>>> heap_prepare_freeze_tuple() on each tuple. I could work on trying to
>>> optimize it, but it isn't currently viable.
>> 
>> Is it too expensive to do so even when we already decided to do some pruning?
>> I am not surprised it's too expensive when there's not even a dead tuple on
>> the page.  But I am mildly surprised if it's too expensive to do when we'd WAL
>> log anyway?
> 
> It's not really possible in the current code structure to only call
> heap_prepare_freeze_tuple() when there are at least some prunable
> tuples. We go through the line pointers and record them as prunable at
> the same time we call heap_prepare_freeze_tuple(), so we won't know
> until we've examined all line pointers that there are no prunable
> tuples, at which point we will have called heap_prepare_freeze_tuple()
> for every tuple.
> 
>>> I think using all_frozen_except_dead while maintaining
>>> visibility_cutoff_xid (in heap_prune_record_unchanged_lp_normal()) has
>>> the potential to be confusing, though. We'd need to keep updating
>>> visibility_cutoff_xid when all_visible is false but
>>> all_frozen_except_dead is true as well as when all_visible is true.
>>> And because we don't care about all_visible_except_dead, it gets even
>>> more confusing to make sure we are maintaining the right variables in
>>> the right situations.
>> 
>> I suspect we should just track all of the horizons/cutoffs all the time. This
>> whole stuff about optimizing out a few conditional assignments complicates the
>> code substantially and feels extremely error prone to me.
> 
> I've done this in v35. I posted the freeze horizon tracking patch
> separately in [1] but it is in v35 as 0004. Tracking the newest live
> xid is in 0009. This also always tracks all_visible for all callers
> since I unconditionally pass the vmbuffer now. I still don't set the
> VM if the query is modifying the relation, though.
> 
>> I probably complained about this before, and it's not this patch's fault, but
>> PruneState->{all_visible,all_frozen} are imo confusingly named, due to
>> sounding like they describe the current state, rather than the possible state
>> after pruning.  It's not helped by this comment:
>> 
>>         * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
>>         * That's convenient for heap_page_prune_and_freeze() to use them to
>>         * decide whether to opportunistically freeze the page or not.  The
>>         * all_visible and all_frozen values ultimately used to set the VM are
>>         * adjusted to include LP_DEAD items after we determine whether or not to
>>         * opportunistically freeze.
>> 
>> "all-visible ... are adjusted to include LP_DEAD" ... - just reading that it's
>> hard to know what it means.
> 
> 0003 does the rename.
> 
>> The first thing to improve pruning performance that I would do is to introduce
>> a fastpath for pages that a) area already frozen b) do not have dead items (if
>> we're not freezing). Iterating through HOT chains is far from cheap, and if
>> all rows are live, there's not really a point in doing so.  This is
>> particulary important for VACUUMs where we end up freezing a ton of pages that
>> are already frozen, due to the silly skip_pages_threshold thing.
> 
> 0007 adds a fast path.
> 
>>> +static TransactionId
>>> +get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
>>> +                              uint8 old_vmbits, uint8 new_vmbits,
>>> +                              TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
>>> +                              TransactionId visibility_cutoff_xid)
>>> +{
>>> +     TransactionId conflict_xid;
>>> +
>>> +     /*
>>> +      * We can omit the snapshot conflict horizon if we are not pruning or
>>> +      * freezing any tuples and are setting an already all-visible page
>>> +      * all-frozen in the VM.
>> 
>> Maybe mention when this can happen, because it's not immediately obvious.
> 
> I've added this to my TODO. I honestly can't think of a scenario where
> it can happen. But I remember spending quite a bit of time thinking
> about it on another occasion. The current code (in master) does
> specifically account for this scenario, which is why I kept the logic,
> but I'm not sure how it can happen.
> 
> I made all the other changes to specific comments you mentioned in
> your mail but I won't bore you with itemization.
> 
>>>      if (do_set_vm)
>>>              conflict_xid = visibility_cutoff_xid;
>>>      else if (do_freeze)
>>>              conflict_xid = frz_conflict_horizon;
>>>      else
>>>              conflict_xid = InvalidTransactionId;
>> 
>> Could it be worth checking that if (do_set_vm && do_freeze) the
>> frz_conflict_horizon won't "violated" by using visibility_cutoff_xid instead?
> 
> Yes, as you mentioned off-list, this wasn't right. New code is like this
> 
> TransactionId conflict_xid = InvalidTransactionId;
> ...
>    if (do_set_vm)
>        conflict_xid = newest_live_xid;
>    if (do_freeze && TransactionIdFollows(newest_frozen_xid, conflict_xid))
>        conflict_xid = newest_frozen_xid;
> 
>>> From 8d350868206456f631883a40a955dff480e408d3 Mon Sep 17 00:00:00 2001
>>> From: Melanie Plageman <[email protected]>
>>> Date: Wed, 17 Dec 2025 16:51:05 -0500
>>> Subject: [PATCH v34 09/14] Use GlobalVisState in vacuum to determine page
>>> level visibility
>>> 
>>> [...]
>>> 
>>> Because comparing a transaction ID against GlobalVisState is more
>>> expensive than comparing against a single XID, we defer this check until
>>> after scanning all tuples on the page.
>> 
>> Curious, is this a precaution or was this a measurable bottleneck?
> 
> I did see GlobalVisTestXidMaybeRunning() in a profile I did when it
> was still called for every HEAPTUPLE_LIVE tuple in
> heap_prune_record_unchanged_lp_normal(), but I don't have the profile
> or test case around anymore.
> 
> However, since I now unconditionally maintain the newest_live_xid,
> moving GlobalVisTestXidMaybeRunning() back into
> heap_prune_record_unchanged_lp_normal() wouldn't help us avoid any
> work. It would just make the values of prstate.set_all_visible and
> prstate.set_all_frozen more accurate sooner. But I don't think it's
> worth the extra function call since set_all_frozen and set_all_visible
> won't be totally "done" until after we decide whether or not to
> opportunistically freeze anyway.
> 
>>> @@ -1077,6 +1078,24 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>>>      prune_freeze_plan(RelationGetRelid(params->relation),
>>>                                        buffer, &prstate, off_loc);
>>> 
>>> +     /*
>>> +      * After processing all the live tuples on the page, if the newest xmin
>>> +      * amongst them may be considered running by any snapshot, the page cannot
>>> +      * be all-visible.
>>> +      */
>>> +     if (prstate.all_visible &&
>>> +             TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
>> 
>> Any reason to test IsNormal rather than just IsValid()?  There should never be
>> a reason it's a valid but not "normal" xid, right?
> 
> Well the reason I did this was that the existing code in master
> tracking visibility_cutoff_xid only advances it if
> TransactionIdIsNormal(). I'm a bit confused about it too because it
> seems like we would still want to do it for bootstrap mode xids. But I
> see PageSetPrunable() only allows normal xids.
> 
>>> @@ -1794,28 +1812,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
>>>                              }
>>> 
>>>                              /*
>>> -                              * The inserter definitely committed.  But is it old enough
>>> -                              * that everyone sees it as committed?  A FrozenTransactionId
>>> -                              * is seen as committed to everyone.  Otherwise, we check if
>>> -                              * there is a snapshot that considers this xid to still be
>>> -                              * running, and if so, we don't consider the page all-visible.
>>> +                              * The inserter definitely committed. But we don't know if it
>>> +                              * is old enough that everyone sees it as committed. Later,
>>> +                              * after processing all the tuples on the page, we'll check if
>>> +                              * there is any snapshot that still considers the newest xid
>>> +                              * on the page to be running. If so, we don't consider the
>>> +                              * page all-visible.
>>>                               */
>>>                              xmin = HeapTupleHeaderGetXmin(htup);
>>> 
>>> -                             /*
>>> -                              * For now always use prstate->cutoffs for this test, because
>>> -                              * we only update 'all_visible' and 'all_frozen' when freezing
>>> -                              * is requested. We could use GlobalVisTestIsRemovableXid
>>> -                              * instead, if a non-freezing caller wanted to set the VM bit.
>>> -                              */
>>> -                             Assert(prstate->cutoffs);
>>> -                             if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
>>> -                             {
>>> -                                     prstate->all_visible = false;
>>> -                                     prstate->all_frozen = false;
>>> -                                     break;
>>> -                             }
>>> -
>>>                              /* Track newest xmin on page. */
>>>                              if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
>>>                                      TransactionIdIsNormal(xmin))
>> 
>> Kinda wonder if this cod eshould be in something like
>> heap_prune_record_freezable() or such, rather than be inside
>> heap_prune_record_unchanged_lp_normal().
> 
> I played around with it, but it all felt a bit awkward. I wrote it
> down for a future enhancement idea.
> 
>>> Subject: [PATCH v34 10/14] Unset all_visible sooner if not freezing
>>> 
>>> In the prune/freeze path, we currently delay clearing all_visible and
>>> all_frozen in the presence of dead items to allow opportunistic
>>> freezing.
>>> 
>>> However, if no freezing will be attempted, there’s no need to delay.
>>> Clearing the flags earlier avoids extra bookkeeping in
>>> heap_prune_record_unchanged_lp_normal(). This currently has no runtime
>>> effect because all callers that consider setting the VM also prepare
>>> freeze plans, but upcoming changes will allow on-access pruning to set
>>> the VM without freezing. The extra bookkeeping was noticeable in a
>>> profile of on-access VM setting.
>> 
>> What workload was that?
> 
> It was a select * offset all query with a few fat tuples on each page
> and none of them prunable. I'm planning on digging up the
> case/creating a new one to see if it is reproducible. This was with an
> older version of the code that had more conditionals as well. This
> commit is actually dropped in v35 because I now always keep
> newest_live_xid up-to-date (0009) which means unsetting
> set_all_visible sooner has no benefit.
> 
>> Theoretically, even if we don't freeze, the page still may be all-visible or
>> all frozen after the removal of dead items, no? Practically that won't happen,
>> because we don't remove dead items in any of the relevant paths, but from the
>> commit message and comments that's not entirely clear.
> 
> Yea, it's clearer with the commit dropped.
> 
>>> @@ -678,6 +678,12 @@ typedef struct EState
>>>                                                                       * ExecDoInitialPruning() */
>>>      const char *es_sourceText;      /* Source text from QueryDesc */
>>> 
>>> +     /*
>>> +      * RT indexes of relations modified by the query through a
>>> +      * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
>>> +      */
>>> +     Bitmapset  *es_modified_relids;
>>> +
>> 
>> Other EState fields are initialized in CreateExecutorState, this isn't afaict?
> 
> Oops, yes. I based it on es_unpruned_relids which wasn't initialized
> there either. I've added a commit (0013) to initialize a few EState
> fields that weren't initialized in CreateExecutorState() as well.
> 
>> Wonder if it's worth adding a crosscheck somewhere, verifying that if a
>> relation is modified, it's in es_modified_relids. Otherwise this could very
>> well silently get out of date.
> 
> Done in v35 (0014).
> 
>> Also, there's some overlap between the informtion collected this way, and
>> AcquireExecutorLocks(), ScanQueryForLocks(), which determine the needed lock
>> modes via rte->rellockmode.
> 
> Those are in parser/planner, so it doesn't seem like a good fit. I
> populate es_modified_relids in the executor.
> 
> I don't know exactly what the overlap would be between RTEs with an
> exclusive rellockmode and es_modified_relids. It seems like you could
> have RTEs which don't end up getting modified that have a lock level
> that would have made you think that they would be modified.
> 
> But were you imagining a substitution or a cross-check?
> 
>>> From 8205b2d7da0c3ad3cbc5cead336ced677996b37d Mon Sep 17 00:00:00 2001
>>> From: Melanie Plageman <[email protected]>
>>> Date: Wed, 3 Dec 2025 15:12:18 -0500
>>> Subject: [PATCH v34 12/14] Pass down information on table modification to scan
>>> node
>> 
>> Perhaps worth splitting up, so the addition of the 0 flag is separate from the
>> the read only hint aspect.
> 
> Done.
> 
> [1] https://www.postgresql.org/message-id/CAAKRu_bbaUV8OUjAfVa_iALgKnTSfB4gO3jnkfpcFgrxEpSGJQ%40mail.gma...
> <v35-0001-Move-commonly-used-context-into-PruneState-and-s.patch><v35-0002-Add-PageGetPruneXid-helper.patch><v35-0003-Rename-PruneState-all_visible-all_frozen.patch><v35-0004-Use-the-newest-to-be-frozen-xid-as-the-conflict-.patch><v35-0005-Save-vmbuffer-in-heap-specific-scan-descriptors-.patch><v35-0006-Fix-visibility-map-corruption-in-more-cases.patch><v35-0007-Add-pruning-fast-path-for-all-visible-and-all-fr.patch><v35-0008-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v35-0009-Keep-newest-live-XID-up-to-date-even-if-page-not.patch><v35-0010-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v35-0011-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v35-0012-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v35-0013-Initialize-missing-fields-in-CreateExecutorState.patch><v35-0014-Track-which-relations-are-modified-by-a-query.patch><v35-0015-Make-begin_scan-functions-take-a-flags-argument.patch><v35-0016-Pass-down-information-on-table-modification-to-s.patch><v35-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch><v35-0018-Set-pd_prune_xid-on-insert.patch>

1 - 0001
```
+prune_freeze_plan(PruneState *prstate, OffsetNumber *off_loc)
 {
-	Page		page = BufferGetPage(buffer);
-	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+	Page		page = prstate->page;
+	BlockNumber blockno = prstate->block;
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
```

As there is a local “page”, maybe just use the local one for PageGetMaxOffsetNumber.

0002 looks good.

2 - 0003 - Does it make sense to also do the same renaming in PruneFreezeResult?

3 - 0004
```
-
-		/*
-		 * Calculate what the snapshot conflict horizon should be for a record
-		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
-		 * for conflicts when the whole page is eligible to become all-frozen
-		 * in the VM once we're done with it. Otherwise, we generate a
-		 * conservative cutoff by stepping back from OldestXmin.
-		 */
-		if (prstate->set_all_frozen)
-			prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
-		else
-		{
-			/* Avoids false conflicts when hot_standby_feedback in use */
-			prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
-			TransactionIdRetreat(prstate->frz_conflict_horizon);
-		}
+		Assert(TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid,
+											 prstate->cutoffs->OldestXmin));
```

At this point of Assert, can prstate->pagefrz.FreezePageConflictXid be InvalidTransactionId? My understanding is no, in that case, would it make sense to also Assert(prstate->pagefrz.FreezePageConflictXid != InvalidTransactionId)?

Otherwise, if prstate->pagefrz.FreezePageConflictXid is still possibly be InvalidTransactionId, then the Assert should be changed to something like:

Assert(prstate->pagefrz.FreezePageConflictXid == InvalidTransactionId || 
  TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid, prstate->cutoffs->OldestXmin)

I will continue with 0005 tomorrow.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-03 15:52  Melanie Plageman <[email protected]>
  parent: Chao Li <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-03-03 15:52 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Tue, Mar 3, 2026 at 2:33 AM Chao Li <[email protected]> wrote:
>
> 2 - 0003 - Does it make sense to also do the same renaming in PruneFreezeResult?

I could do that. Later commits remove them, so I thought it didn't
make sense. If only this commit goes in though, it would make sense.

> -                * Calculate what the snapshot conflict horizon should be for a record
> -                * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
> -                * for conflicts when the whole page is eligible to become all-frozen
> -                * in the VM once we're done with it. Otherwise, we generate a
> -                * conservative cutoff by stepping back from OldestXmin.
> -                */
> -               if (prstate->set_all_frozen)
> -                       prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
> -               else
> -               {
> -                       /* Avoids false conflicts when hot_standby_feedback in use */
> -                       prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
> -                       TransactionIdRetreat(prstate->frz_conflict_horizon);
> -               }
> +               Assert(TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid,
> +                                                                                        prstate->cutoffs->OldestXmin));
> ```
>
> At this point of Assert, can prstate->pagefrz.FreezePageConflictXid be InvalidTransactionId? My understanding is no, in that case, would it make sense to also Assert(prstate->pagefrz.FreezePageConflictXid != InvalidTransactionId)?

I think it is possible if we are doing some kind of freezing to a
multixact that we reach here and FreezePageConflictXid is
InvalidTransactionId.

> Otherwise, if prstate->pagefrz.FreezePageConflictXid is still possibly be InvalidTransactionId, then the Assert should be changed to something like:
>
> Assert(prstate->pagefrz.FreezePageConflictXid == InvalidTransactionId ||
>   TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid, prstate->cutoffs->OldestXmin)

This is covered by TransactionIdPrecedesOrEquals because
InvalidTransactionId is 0. We assume that in many places throughout
the code.

> I will continue with 0005 tomorrow.

Thanks for the review!

I noticed a serious bug in v35-0017: I pass hscan->modifies_base_rel
to heap_page_prune_opt() as rel_read_only, which is the opposite of
what I want to do -- it should be !hscan->modifies_base_rel. I'm going
to wait to fix it though and post a new v36 once I've batched up more
fixups.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-04 08:59  Chao Li <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Chao Li @ 2026-03-04 08:59 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>



> On Mar 3, 2026, at 23:52, Melanie Plageman <[email protected]> wrote:
> 
> 
>> Otherwise, if prstate->pagefrz.FreezePageConflictXid is still possibly be InvalidTransactionId, then the Assert should be changed to something like:
>> 
>> Assert(prstate->pagefrz.FreezePageConflictXid == InvalidTransactionId ||
>>  TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid, prstate->cutoffs->OldestXmin)
> 
> This is covered by TransactionIdPrecedesOrEquals because
> InvalidTransactionId is 0. We assume that in many places throughout
> the code.
> 

I understood that TransactionIdPrecedesOrEquals(InvalidTransactionId, prstate->cutoffs->OldestXmin) is true, but that would leave an impression to code readers that prstate->pagefrz.FreezePageConflictXid could not be InvalidTransactionId. Thus I think my version explicitly tells that prstate->pagefrz.FreezePageConflictXid could be InvalidTransactionId at the point.


>> I will continue with 0005 tomorrow.
> 

4 - 0005
```
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
```

I don’t see why vmbuffer has to be of pointer type. Buffer type is underlying int, I checked the last commit, vmbuffer only passes in data into the function without passing out anything.

As we add the new parameter vmbuffer, though it’s not used in this commit, I think it’d be better to update the header commit to explain what this parameter will do.

5  - 0006
```
+ *
+ * heap_fix_vm_corruption() makes changes to the VM and, potentially, the heap
+ * page, but it does not need to be done in a critical section because
+ * clearing the VM is not WAL-logged.
+ */
+static void
+heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
```

Nit: why the last paragraph of the header comments uses the function name instead of “this function”? Looks like a copy-pasto.

6 - 0006
```
+		if (prstate->lpdead_items > 0)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("LP_DEAD item found on page marked as all-visible"),
+					 errdetail("relation \"%s\", page %u, tuple %u",
+							   RelationGetRelationName(prstate->relation),
+							   prstate->block, offnum)));
+		}
+		else
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all found on page marked as all-visible"),
+					 errdetail("relation \"%s\", page %u, tuple %u",
+							   RelationGetRelationName(prstate->relation),
+							   prstate->block, offnum)));
+		}
```

I recently just learned that a detail message should use complete sentences, and end each with a period, and capitalize the first word of sentences. See https://www.postgresql.org/docs/current/error-style-guide.html.

7 - 0006
```
+	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	{
+		/*
+		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
+		 * the page-level bit is clear.  However, it's possible that the bit
+		 * got cleared after heap_vac_scan_next_block() was called, so we must
+		 * recheck with buffer lock before concluding that the VM is corrupt.
+		 */
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page %u in \"%s\" is not marked all-visible but visibility map bit is set",
+						prstate->block,
+						RelationGetRelationName(prstate->relation))));
+	}
```

The comment says “we must recheck with buffer lock before…”, but it only log a warning message. Is the comment stale?

8 - 0007
```
+static void
+heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	presult->vmbits = prstate->vmbits;
+
+	if (!PageIsEmpty(page))
+		presult->hastup = true;
+}
```

* Given this function has done PageIsEmpty(page), that that is true, we don’t need to count live_tuples, right? That could be a tiny optimization.
* I see heap_page_bypass_prune_freeze() is only called in one place and immediately after prune_freeze_setup() and heap_fix_vm_corruption(), so prstate->vmbits must be 0, so do we need to do presult->vmbits = prstate->vmbits;?
* Do we need to set all_visible and all_frozen to presult?

0008 LGTM

I will continue with 0009 tomorrow.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-05 08:52  Chao Li <[email protected]>
  parent: Chao Li <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Chao Li @ 2026-03-05 08:52 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>



> On Mar 4, 2026, at 16:59, Chao Li <[email protected]> wrote:
> 
> 
> 
>> On Mar 3, 2026, at 23:52, Melanie Plageman <[email protected]> wrote:
>> 
>> 
>>> Otherwise, if prstate->pagefrz.FreezePageConflictXid is still possibly be InvalidTransactionId, then the Assert should be changed to something like:
>>> 
>>> Assert(prstate->pagefrz.FreezePageConflictXid == InvalidTransactionId ||
>>> TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid, prstate->cutoffs->OldestXmin)
>> 
>> This is covered by TransactionIdPrecedesOrEquals because
>> InvalidTransactionId is 0. We assume that in many places throughout
>> the code.
>> 
> 
> I understood that TransactionIdPrecedesOrEquals(InvalidTransactionId, prstate->cutoffs->OldestXmin) is true, but that would leave an impression to code readers that prstate->pagefrz.FreezePageConflictXid could not be InvalidTransactionId. Thus I think my version explicitly tells that prstate->pagefrz.FreezePageConflictXid could be InvalidTransactionId at the point.
> 
> 
>>> I will continue with 0005 tomorrow.
>> 
> 
> 4 - 0005
> ```
>  * Caller must have pin on the buffer, and must *not* have a lock on it.
>  */
> void
> -heap_page_prune_opt(Relation relation, Buffer buffer)
> +heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
> ```
> 
> I don’t see why vmbuffer has to be of pointer type. Buffer type is underlying int, I checked the last commit, vmbuffer only passes in data into the function without passing out anything.
> 
> As we add the new parameter vmbuffer, though it’s not used in this commit, I think it’d be better to update the header commit to explain what this parameter will do.
> 
> 5  - 0006
> ```
> + *
> + * heap_fix_vm_corruption() makes changes to the VM and, potentially, the heap
> + * page, but it does not need to be done in a critical section because
> + * clearing the VM is not WAL-logged.
> + */
> +static void
> +heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
> ```
> 
> Nit: why the last paragraph of the header comments uses the function name instead of “this function”? Looks like a copy-pasto.
> 
> 6 - 0006
> ```
> + if (prstate->lpdead_items > 0)
> + {
> + ereport(WARNING,
> + (errcode(ERRCODE_DATA_CORRUPTED),
> + errmsg("LP_DEAD item found on page marked as all-visible"),
> + errdetail("relation \"%s\", page %u, tuple %u",
> +   RelationGetRelationName(prstate->relation),
> +   prstate->block, offnum)));
> + }
> + else
> + {
> + ereport(WARNING,
> + (errcode(ERRCODE_DATA_CORRUPTED),
> + errmsg("tuple not visible to all found on page marked as all-visible"),
> + errdetail("relation \"%s\", page %u, tuple %u",
> +   RelationGetRelationName(prstate->relation),
> +   prstate->block, offnum)));
> + }
> ```
> 
> I recently just learned that a detail message should use complete sentences, and end each with a period, and capitalize the first word of sentences. See https://www.postgresql.org/docs/current/error-style-guide.html.
> 
> 7 - 0006
> ```
> + else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
> + {
> + /*
> + * As of PostgreSQL 9.2, the visibility map bit should never be set if
> + * the page-level bit is clear.  However, it's possible that the bit
> + * got cleared after heap_vac_scan_next_block() was called, so we must
> + * recheck with buffer lock before concluding that the VM is corrupt.
> + */
> + ereport(WARNING,
> + (errcode(ERRCODE_DATA_CORRUPTED),
> + errmsg("page %u in \"%s\" is not marked all-visible but visibility map bit is set",
> + prstate->block,
> + RelationGetRelationName(prstate->relation))));
> + }
> ```
> 
> The comment says “we must recheck with buffer lock before…”, but it only log a warning message. Is the comment stale?
> 
> 8 - 0007
> ```
> +static void
> +heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
> +{
> + OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
> + Page page = prstate->page;
> +
> + Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
> +   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
> + !prstate->attempt_freeze));
> +
> + /* We'll fill in presult for the caller */
> + memset(presult, 0, sizeof(PruneFreezeResult));
> +
> + /*
> + * Since the page is all-visible, a count of the normal ItemIds on the
> + * page should be sufficient for vacuum's live tuple count.
> + */
> + for (OffsetNumber off = FirstOffsetNumber;
> + off <= maxoff;
> + off = OffsetNumberNext(off))
> + {
> + if (ItemIdIsNormal(PageGetItemId(page, off)))
> + prstate->live_tuples++;
> + }
> +
> + presult->live_tuples = prstate->live_tuples;
> +
> + /* Clear any stale prune hint */
> + if (TransactionIdIsValid(PageGetPruneXid(page)))
> + {
> + PageClearPrunable(page);
> + MarkBufferDirtyHint(prstate->buffer, true);
> + }
> +
> + presult->vmbits = prstate->vmbits;
> +
> + if (!PageIsEmpty(page))
> + presult->hastup = true;
> +}
> ```
> 
> * Given this function has done PageIsEmpty(page), that that is true, we don’t need to count live_tuples, right? That could be a tiny optimization.
> * I see heap_page_bypass_prune_freeze() is only called in one place and immediately after prune_freeze_setup() and heap_fix_vm_corruption(), so prstate->vmbits must be 0, so do we need to do presult->vmbits = prstate->vmbits;?
> * Do we need to set all_visible and all_frozen to presult?
> 
> 0008 LGTM
> 
> I will continue with 0009 tomorrow.
> 

9 - 0009
···
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Other callers must initialize prstate.all_frozen to false,
···

Nit: prstate.all_frozen -> prstate.set_all_frozen

I saw you have fixed this in 0010, but I think it’s better also fix it here.

10 - 0010
```
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		vmbits;
+	BlockNumber new_all_visible_pages;
+	BlockNumber new_all_visible_frozen_pages;
+	BlockNumber new_all_frozen_pages;
```

These 3 fields are actually counts rather than pointers to blocks, using type BlockNumber are quite confusing, though underlying BlockNumber is uint32. I think they can be just int type.

11 - 0010
```
+ BlockNumber new_all_visible_pages;
+ BlockNumber new_all_visible_frozen_pages;
+ BlockNumber new_all_frozen_pages;
```

I don’t see where these 3 fields are initialized. In lazy_scan_prune(), presult is defined as:
```
    PruneFreezeResult presult;
```
So, those fields will hold random values.

12 - 0010
```
+	 * conflict would ahve been handled in reaction to the WAL record freezing
```

Nit: ahve -> have

0011 LGTM

13 - 0012 - bufmask.c
```
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
```

I don’t find a function named heap_xlog_prune_and_freeze().

14 - 0012 - heapam_xlog.c
```
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
```

Same as 13.

0013 LGTM

I will try to finish the rest 5 commits tomorrow.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-06 02:40  Chao Li <[email protected]>
  parent: Chao Li <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Chao Li @ 2026-03-06 02:40 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>



> On Mar 5, 2026, at 16:52, Chao Li <[email protected]> wrote:
> 
> 
> 
>> On Mar 4, 2026, at 16:59, Chao Li <[email protected]> wrote:
>> 
>> 
>> 
>>> On Mar 3, 2026, at 23:52, Melanie Plageman <[email protected]> wrote:
>>> 
>>> 
>>>> Otherwise, if prstate->pagefrz.FreezePageConflictXid is still possibly be InvalidTransactionId, then the Assert should be changed to something like:
>>>> 
>>>> Assert(prstate->pagefrz.FreezePageConflictXid == InvalidTransactionId ||
>>>> TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid, prstate->cutoffs->OldestXmin)
>>> 
>>> This is covered by TransactionIdPrecedesOrEquals because
>>> InvalidTransactionId is 0. We assume that in many places throughout
>>> the code.
>>> 
>> 
>> I understood that TransactionIdPrecedesOrEquals(InvalidTransactionId, prstate->cutoffs->OldestXmin) is true, but that would leave an impression to code readers that prstate->pagefrz.FreezePageConflictXid could not be InvalidTransactionId. Thus I think my version explicitly tells that prstate->pagefrz.FreezePageConflictXid could be InvalidTransactionId at the point.
>> 
>> 
>>>> I will continue with 0005 tomorrow.
>>> 
>> 
>> 4 - 0005
>> ```
>> * Caller must have pin on the buffer, and must *not* have a lock on it.
>> */
>> void
>> -heap_page_prune_opt(Relation relation, Buffer buffer)
>> +heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
>> ```
>> 
>> I don’t see why vmbuffer has to be of pointer type. Buffer type is underlying int, I checked the last commit, vmbuffer only passes in data into the function without passing out anything.
>> 
>> As we add the new parameter vmbuffer, though it’s not used in this commit, I think it’d be better to update the header commit to explain what this parameter will do.
>> 
>> 5  - 0006
>> ```
>> + *
>> + * heap_fix_vm_corruption() makes changes to the VM and, potentially, the heap
>> + * page, but it does not need to be done in a critical section because
>> + * clearing the VM is not WAL-logged.
>> + */
>> +static void
>> +heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
>> ```
>> 
>> Nit: why the last paragraph of the header comments uses the function name instead of “this function”? Looks like a copy-pasto.
>> 
>> 6 - 0006
>> ```
>> + if (prstate->lpdead_items > 0)
>> + {
>> + ereport(WARNING,
>> + (errcode(ERRCODE_DATA_CORRUPTED),
>> + errmsg("LP_DEAD item found on page marked as all-visible"),
>> + errdetail("relation \"%s\", page %u, tuple %u",
>> +   RelationGetRelationName(prstate->relation),
>> +   prstate->block, offnum)));
>> + }
>> + else
>> + {
>> + ereport(WARNING,
>> + (errcode(ERRCODE_DATA_CORRUPTED),
>> + errmsg("tuple not visible to all found on page marked as all-visible"),
>> + errdetail("relation \"%s\", page %u, tuple %u",
>> +   RelationGetRelationName(prstate->relation),
>> +   prstate->block, offnum)));
>> + }
>> ```
>> 
>> I recently just learned that a detail message should use complete sentences, and end each with a period, and capitalize the first word of sentences. See https://www.postgresql.org/docs/current/error-style-guide.html.
>> 
>> 7 - 0006
>> ```
>> + else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
>> + {
>> + /*
>> + * As of PostgreSQL 9.2, the visibility map bit should never be set if
>> + * the page-level bit is clear.  However, it's possible that the bit
>> + * got cleared after heap_vac_scan_next_block() was called, so we must
>> + * recheck with buffer lock before concluding that the VM is corrupt.
>> + */
>> + ereport(WARNING,
>> + (errcode(ERRCODE_DATA_CORRUPTED),
>> + errmsg("page %u in \"%s\" is not marked all-visible but visibility map bit is set",
>> + prstate->block,
>> + RelationGetRelationName(prstate->relation))));
>> + }
>> ```
>> 
>> The comment says “we must recheck with buffer lock before…”, but it only log a warning message. Is the comment stale?
>> 
>> 8 - 0007
>> ```
>> +static void
>> +heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
>> +{
>> + OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
>> + Page page = prstate->page;
>> +
>> + Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
>> +   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
>> + !prstate->attempt_freeze));
>> +
>> + /* We'll fill in presult for the caller */
>> + memset(presult, 0, sizeof(PruneFreezeResult));
>> +
>> + /*
>> + * Since the page is all-visible, a count of the normal ItemIds on the
>> + * page should be sufficient for vacuum's live tuple count.
>> + */
>> + for (OffsetNumber off = FirstOffsetNumber;
>> + off <= maxoff;
>> + off = OffsetNumberNext(off))
>> + {
>> + if (ItemIdIsNormal(PageGetItemId(page, off)))
>> + prstate->live_tuples++;
>> + }
>> +
>> + presult->live_tuples = prstate->live_tuples;
>> +
>> + /* Clear any stale prune hint */
>> + if (TransactionIdIsValid(PageGetPruneXid(page)))
>> + {
>> + PageClearPrunable(page);
>> + MarkBufferDirtyHint(prstate->buffer, true);
>> + }
>> +
>> + presult->vmbits = prstate->vmbits;
>> +
>> + if (!PageIsEmpty(page))
>> + presult->hastup = true;
>> +}
>> ```
>> 
>> * Given this function has done PageIsEmpty(page), that that is true, we don’t need to count live_tuples, right? That could be a tiny optimization.
>> * I see heap_page_bypass_prune_freeze() is only called in one place and immediately after prune_freeze_setup() and heap_fix_vm_corruption(), so prstate->vmbits must be 0, so do we need to do presult->vmbits = prstate->vmbits;?
>> * Do we need to set all_visible and all_frozen to presult?
>> 
>> 0008 LGTM
>> 
>> I will continue with 0009 tomorrow.
>> 
> 
> 9 - 0009
> ···
> +  * Currently, only VACUUM performs freezing, but other callers may in the
> +  * future. Other callers must initialize prstate.all_frozen to false,
> ···
> 
> Nit: prstate.all_frozen -> prstate.set_all_frozen
> 
> I saw you have fixed this in 0010, but I think it’s better also fix it here.
> 
> 10 - 0010
> ```
> +  * Whether or not the page was newly set all-visible and all-frozen during
> +  * phase I of vacuuming.
>  */
> - uint8 vmbits;
> + BlockNumber new_all_visible_pages;
> + BlockNumber new_all_visible_frozen_pages;
> + BlockNumber new_all_frozen_pages;
> ```
> 
> These 3 fields are actually counts rather than pointers to blocks, using type BlockNumber are quite confusing, though underlying BlockNumber is uint32. I think they can be just int type.
> 
> 11 - 0010
> ```
> + BlockNumber new_all_visible_pages;
> + BlockNumber new_all_visible_frozen_pages;
> + BlockNumber new_all_frozen_pages;
> ```
> 
> I don’t see where these 3 fields are initialized. In lazy_scan_prune(), presult is defined as:
> ```
>    PruneFreezeResult presult;
> ```
> So, those fields will hold random values.
> 
> 12 - 0010
> ```
> +  * conflict would ahve been handled in reaction to the WAL record freezing
> ```
> 
> Nit: ahve -> have
> 
> 0011 LGTM
> 
> 13 - 0012 - bufmask.c
> ```
> +  * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
> +  * for more details.
> ```
> 
> I don’t find a function named heap_xlog_prune_and_freeze().
> 
> 14 - 0012 - heapam_xlog.c
> ```
> +  * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
> +  * heap_xlog_prune_and_freeze()).
> ```
> 
> Same as 13.
> 
> 0013 LGTM
> 
> I will try to finish the rest 5 commits tomorrow.
> 

15 - 0014 - execMain.c
```
@@ -3027,6 +3035,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_range_table_size = parentestate->es_range_table_size;
 	rcestate->es_relations = parentestate->es_relations;
 	rcestate->es_rowmarks = parentestate->es_rowmarks;
+	rcestate->es_modified_relids = parentestate->es_modified_relids;
```

Here it just assigns the BMS pointer to rcestate->es_modified_relids. I am not sure if further bms_add_member() will still happen, if yes, it might be safer to do bms_copy(parentestate->es_modified_relids), because a further bms_add_member() may cause a new memory allocated and the old pointer stale.

16 - 0014 - execUtils.c
```
for (rti = 1; rti <= estate->es_range_table_size; rti++)
```

Nit: I have seen several recent commits that performed cleanups to switch to use for loop var like:
```
for (Index rti = 1; rti <= estate->es_range_table_size; rti++)
```

17 - 0015

The commit message subject line says “Make begin_scan() functions take a flags argument”, where begin_scan() seems inaccurate, for example, table_index_fetch_begin() is not “begin scan”.

Otherwise 0015 LGTM.

18 - 0016 - tableam.h
```
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
```

Nit: maybe add an empty line before the new flag.

19 - 0017 - heapam_handler.c
```
@@ -147,7 +147,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								hscan->modifies_base_rel);
```

This feels like a bug. heap_page_prune_opt takes the first parameter rel_read_only, but hscan->modifies_base_rel means not read-only, so here we should use “!hscan->modifies_base_rel”.

Oh, when I read back your previous email, you have found this bug.

20 - 0018
In heap_insert(), you do:
```
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
```

But in heap_multi_insert(), you do:
```
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
```

Is the option check " !(options & HEAP_INSERT_FROZEN))” also needed by heap_multi_insert?

~~ Done of this round review ~~

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-06 23:33  Melanie Plageman <[email protected]>
  parent: Chao Li <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-03-06 23:33 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Thanks for the review! Attached is v36. I've pushed some of the early
patches in the set and this is what is left. I've also done some of
the performance evaluation and microbenchmarking of the "worst case"
scenario promised in my earlier reply to Andres [1].

I used the following to test the worst-case performance of my patch:

pgbench -n -r -t 9 -f - <<'SQL'
checkpoint;
DROP TABLE IF EXISTS foo;
CREATE TABLE foo(a int, cnt int, data text) WITH (autovacuum_enabled =
false, fillfactor = 10);
ALTER TABLE foo ALTER COLUMN data SET STORAGE PLAIN;
CREATE INDEX ON foo(a);
INSERT INTO foo
SELECT i, 1, repeat(' ', 8192/10)
FROM generate_series(1,100000) i;
vacuum (freeze) foo;
update foo set cnt = cnt + 1;
select * from foo offset 100000000;
update foo set cnt = cnt + 1;
SQL

What I see is an expected slowdown for the SELECT * FROM foo OFFSET --
because it emits slightly more WAL and pins and dirties a few more
buffers. And a slight slowdown for the UPDATE following the SELECT
because it then must clear those VM bits. (This is no different than
if you had run a vacuum before doing the update).

These slowdowns are expected since this microbenchmark is designed to
be a worst case. Every buffer has a single tuple and the SELECT needs
to access no tuples because of the OFFSET. This minimizes all other
overheads to magnify the overhead of setting and clearing the VM.

I also tested if unconditionally pinning the VM even when we don't set
it had any impact on performance of on-access pruning for logged
tables. I used the setup above but patched the code to not set the VM
on-access. I found that there is no negative performance impact to the
SELECT * OFFSET. If foo is an unlogged table I do see a very slight
overhead for the SELECT * OFFSET.

And in all cases, with the patch, the vacuum above is faster because
of using the combined WAL record.

I believe I've addressed all of your review feedback. Below are
combined inline remarks to all three of your emails:

On Wed, Mar 4, 2026 at 4:00 AM Chao Li <[email protected]> wrote:
>
>   * Caller must have pin on the buffer, and must *not* have a lock on it.
>   */
>  void
> -heap_page_prune_opt(Relation relation, Buffer buffer)
> +heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
> ```
>
> I don’t see why vmbuffer has to be of pointer type. Buffer type is underlying int, I checked the last commit, vmbuffer only passes in data into the function without passing out anything.

We want to save the vmbuffer in the scan descriptor so we can use it
across calls to heap_page_prune_opt(). Therefore we have to pass it by
reference. We pin the VM in heap_page_prune_opt() and if we don't save
a reference to it, we'll have to pin it again on the next call (see
visibilitymap_pin() code).

> As we add the new parameter vmbuffer, though it’s not used in this commit, I think it’d be better to update the header commit to explain what this parameter will do.

Thanks, I've updated the header comment.

> + * heap_fix_vm_corruption() makes changes to the VM and, potentially, the heap
> + * page, but it does not need to be done in a critical section because
> + * clearing the VM is not WAL-logged.
> + */
> +static void
> +heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
>
> Nit: why the last paragraph of the header comments uses the function name instead of “this function”? Looks like a copy-pasto.

Fixed.

> 6 - 0006
> ```
> +               if (prstate->lpdead_items > 0)
> +               {
> +                       ereport(WARNING,
> +                                       (errcode(ERRCODE_DATA_CORRUPTED),
> +                                        errmsg("LP_DEAD item found on page marked as all-visible"),
> +                                        errdetail("relation \"%s\", page %u, tuple %u",
> +                                                          RelationGetRelationName(prstate->relation),
> +                                                          prstate->block, offnum)));
> +               }
> +               else
> +               {
> +                       ereport(WARNING,
> +                                       (errcode(ERRCODE_DATA_CORRUPTED),
> +                                        errmsg("tuple not visible to all found on page marked as all-visible"),
> +                                        errdetail("relation \"%s\", page %u, tuple %u",
> +                                                          RelationGetRelationName(prstate->relation),
> +                                                          prstate->block, offnum)));
> +               }
> ```
>
> I recently just learned that a detail message should use complete sentences, and end each with a period, and capitalize the first word of sentences. See https://www.postgresql.org/docs/current/error-style-guide.html.

Ah thanks for noticing. I've gone ahead and changed them to errcontext
instead of errdetail. I think the messages are more compliant now.

> +       else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
> +       {
> +               /*
> +                * As of PostgreSQL 9.2, the visibility map bit should never be set if
> +                * the page-level bit is clear.  However, it's possible that the bit
> +                * got cleared after heap_vac_scan_next_block() was called, so we must
> +                * recheck with buffer lock before concluding that the VM is corrupt.
> +                */
> +               ereport(WARNING,
> +                               (errcode(ERRCODE_DATA_CORRUPTED),
> +                                errmsg("page %u in \"%s\" is not marked all-visible but visibility map bit is set",
> +                                               prstate->block,
> +                                               RelationGetRelationName(prstate->relation))));
> +       }
>
> The comment says “we must recheck with buffer lock before…”, but it only log a warning message. Is the comment stale?

We have the buffer lock here. The comment means that we need to check
now -- a time when we have the buffer lock because when we checked in
heap_vac_scan_next_block() we did not have the buffer lock. I've
updated the comment to try to make that more clear.

> +static void
> +heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
> +{
>
> * Given this function has done PageIsEmpty(page), that that is true, we don’t need to count live_tuples, right? That could be a tiny optimization.

Okay, I've tried this. I didn't want a lot of indentation, so I
reorganized the code. I'm not sure if it is more error-prone now,
though...

> * I see heap_page_bypass_prune_freeze() is only called in one place and immediately after prune_freeze_setup() and heap_fix_vm_corruption(), so prstate->vmbits must be 0, so do we need to do presult->vmbits = prstate->vmbits;?

Actually vmbits can't be zero, otherwise we won't reach the fast path
code. Or do you mean something else?

> * Do we need to set all_visible and all_frozen to presult?

I memset to 0 the other fields, so it isn't needed.

On Thu, Mar 5, 2026 at 3:53 AM Chao Li <[email protected]> wrote:
>
> 9 - 0009
> ···
> +        * Currently, only VACUUM performs freezing, but other callers may in the
> +        * future. Other callers must initialize prstate.all_frozen to false,
> ···
>
> Nit: prstate.all_frozen -> prstate.set_all_frozen
>
> I saw you have fixed this in 0010, but I think it’s better also fix it here.

Done.

> 10 - 0010
> ```
> +        * Whether or not the page was newly set all-visible and all-frozen during
> +        * phase I of vacuuming.
>          */
> -       uint8           vmbits;
> +       BlockNumber new_all_visible_pages;
> +       BlockNumber new_all_visible_frozen_pages;
> +       BlockNumber new_all_frozen_pages;
> ```
>
> These 3 fields are actually counts rather than pointers to blocks, using type BlockNumber are quite confusing, though underlying BlockNumber is uint32. I think they can be just int type.

Covered in [2].

> + BlockNumber new_all_visible_pages;
> + BlockNumber new_all_visible_frozen_pages;
> + BlockNumber new_all_frozen_pages;
>
> I don’t see where these 3 fields are initialized. In lazy_scan_prune(), presult is defined as:
>     PruneFreezeResult presult;
> So, those fields will hold random values.

Yes, thank you. I've fixed that.

> 13 - 0012 - bufmask.c
> ```
> +        * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
> +        * for more details.
> ```
>
> I don’t find a function named heap_xlog_prune_and_freeze().

Fixed in both places (-> heap_xlog_prune_freeze()).

On Thu, Mar 5, 2026 at 9:41 PM Chao Li <[email protected]> wrote:
>
> 15 - 0014 - execMain.c
> ```
> @@ -3027,6 +3035,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
>         rcestate->es_range_table_size = parentestate->es_range_table_size;
>         rcestate->es_relations = parentestate->es_relations;
>         rcestate->es_rowmarks = parentestate->es_rowmarks;
> +       rcestate->es_modified_relids = parentestate->es_modified_relids;
> ```
>
> Here it just assigns the BMS pointer to rcestate->es_modified_relids. I am not sure if further bms_add_member() will still happen, if yes, it might be safer to do bms_copy(parentestate->es_modified_relids), because a further bms_add_member() may cause a new memory allocated and the old pointer stale.

Yes, it's at least a bit of future proofing. Done in v36.

> 16 - 0014 - execUtils.c
> for (rti = 1; rti <= estate->es_range_table_size; rti++)
>
> Nit: I have seen several recent commits that performed cleanups to switch to use for loop var like:
> for (Index rti = 1; rti <= estate->es_range_table_size; rti++)

Updated.

> 17 - 0015
>
> The commit message subject line says “Make begin_scan() functions take a flags argument”, where begin_scan() seems inaccurate, for example, table_index_fetch_begin() is not “begin scan”.
>
> Otherwise 0015 LGTM.

I've rewritten the commit message.

> 20 - 0018
> In heap_insert(), you do:
> +       if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
> +               PageSetPrunable(page, xid);
>
> But in heap_multi_insert(), you do:
> +               if (!all_frozen_set && TransactionIdIsNormal(xid))
> +                       PageSetPrunable(page, xid);
>
> Is the option check " !(options & HEAP_INSERT_FROZEN))” also needed by heap_multi_insert?

heap_multi_insert() incorporates that into the variable
all_frozen_set, so it is not needed.

I've now also added setting prune hint for the new page on updates --
which I forgot before.

- Melanie

[1] https://www.postgresql.org/message-id/CAAKRu_a1V7TUUYM7qO2c5Z-JyTKOsrryQBrk7Eu69ESzhqgd9w%40mail.gma...
[2] https://www.postgresql.org/message-id/flat/CA%2BFpmFdrM%3DL5f%3De7%2BwqOkFkYK6r_S%3DTdKrHQ5qPbTNaoVG...


Attachments:

  [text/x-patch] v36-0001-Use-the-newest-to-be-frozen-xid-as-the-conflict-.patch (6.5K, 2-v36-0001-Use-the-newest-to-be-frozen-xid-as-the-conflict-.patch)
  download | inline diff:
From 6fe999048e0c3d5b268e5b34fb1af8a4621d24fe Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 11:39:28 -0500
Subject: [PATCH v36 01/16] Use the newest to-be-frozen xid as the conflict
 horizon for freezing

Previously WAL records that froze tuples used OldestXmin as the snapshot
conflict horizon. However, OldestXmin is newer than the newest frozen
tuple's xid. By tracking the newest to-be-frozen xid and using it as the
snapshot conflict horizon instead, we end up with an older horizon that
will result in fewer query cancellations on the standby.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Peter Geoghegan <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/CAAKRu_bbaUV8OUjAfVa_iALgKnTSfB4gO3jnkfpcFgrxEpSGJQ%40mail.gmail.com
---
 src/backend/access/heap/heapam.c    | 12 ++++++++++
 src/backend/access/heap/pruneheap.c | 34 +++++++++--------------------
 src/include/access/heapam.h         |  8 +++++++
 3 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a231563f0df..649ee6e7669 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6781,6 +6781,10 @@ heap_inplace_unlock(Relation relation,
  * NB: Caller should avoid needlessly calling heap_tuple_should_freeze when we
  * have already forced page-level freezing, since that might incur the same
  * SLRU buffer misses that we specifically intended to avoid by freezing.
+ *
+ * We won't update the FreezePageConflictXid because any lockers don't affect
+ * visibility on the standby, and we don't ahve to worry about the update XID
+ * since the only way it can be older than OldestXmin is if it is aborted.
  */
 static TransactionId
 FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
@@ -7173,7 +7177,11 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 
 		/* Verify that xmin committed if and when freeze plan is executed */
 		if (freeze_xmin)
+		{
 			frz->checkflags |= HEAP_FREEZE_CHECK_XMIN_COMMITTED;
+			if (TransactionIdFollows(xid, pagefrz->FreezePageConflictXid))
+				pagefrz->FreezePageConflictXid = xid;
+		}
 	}
 
 	/*
@@ -7192,6 +7200,9 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 		 */
 		replace_xvac = pagefrz->freeze_required = true;
 
+		if (TransactionIdFollows(xid, pagefrz->FreezePageConflictXid))
+			pagefrz->FreezePageConflictXid = xid;
+
 		/* Will set replace_xvac flags in freeze plan below */
 	}
 
@@ -7501,6 +7512,7 @@ heap_freeze_tuple(HeapTupleHeader tuple,
 	pagefrz.freeze_required = true;
 	pagefrz.FreezePageRelfrozenXid = FreezeLimit;
 	pagefrz.FreezePageRelminMxid = MultiXactCutoff;
+	pagefrz.FreezePageConflictXid = InvalidTransactionId;
 	pagefrz.NoFreezePageRelfrozenXid = FreezeLimit;
 	pagefrz.NoFreezePageRelminMxid = MultiXactCutoff;
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 65c9f393f41..eebd6cf57ea 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -377,6 +377,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/* initialize page freezing working state */
 	prstate->pagefrz.freeze_required = false;
+	prstate->pagefrz.FreezePageConflictXid = InvalidTransactionId;
 	if (prstate->attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
@@ -407,7 +408,6 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * PruneState.
 	 */
 	prstate->deadoffsets = presult->deadoffsets;
-	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
 	 * Vacuum may update the VM after we're done.  We can keep track of
@@ -746,22 +746,8 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 		 * critical section.
 		 */
 		heap_pre_freeze_checks(prstate->buffer, prstate->frozen, prstate->nfrozen);
-
-		/*
-		 * Calculate what the snapshot conflict horizon should be for a record
-		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
-		 * for conflicts when the whole page is eligible to become all-frozen
-		 * in the VM once we're done with it. Otherwise, we generate a
-		 * conservative cutoff by stepping back from OldestXmin.
-		 */
-		if (prstate->set_all_frozen)
-			prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
-		else
-		{
-			/* Avoids false conflicts when hot_standby_feedback in use */
-			prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
-			TransactionIdRetreat(prstate->frz_conflict_horizon);
-		}
+		Assert(TransactionIdPrecedes(prstate->pagefrz.FreezePageConflictXid,
+									 prstate->cutoffs->OldestXmin));
 	}
 	else if (prstate->nfrozen > 0)
 	{
@@ -953,17 +939,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * The snapshotConflictHorizon for the whole record should be the
 			 * most conservative of all the horizons calculated for any of the
 			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
+			 * queries on the standby older than the youngest xid of the most
+			 * recently removed tuple this record will prune will conflict. If
+			 * this record will freeze tuples, any queries on the standby with
+			 * xids older than the youngest tuple this record will freeze will
+			 * conflict.
 			 */
 			TransactionId conflict_xid;
 
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
+			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
 									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
+				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
 			else
 				conflict_xid = prstate.latest_xid_removed;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 24a27cc043a..d083f825b39 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -208,6 +208,14 @@ typedef struct HeapPageFreeze
 	TransactionId FreezePageRelfrozenXid;
 	MultiXactId FreezePageRelminMxid;
 
+	/*
+	 * The youngest XID that will be frozen or removed during freezing. It is
+	 * used to calculate the snapshot conflict horizon for a WAL record
+	 * freezing tuples. Because it is only used if we do end up freezing
+	 * tuples, there is no need for a "no freeze" version.
+	 */
+	TransactionId FreezePageConflictXid;
+
 	/*
 	 * "No freeze" NewRelfrozenXid/NewRelminMxid trackers.
 	 *
-- 
2.43.0



  [text/x-patch] v36-0002-Save-vmbuffer-in-heap-specific-scan-descriptors-.patch (6.2K, 3-v36-0002-Save-vmbuffer-in-heap-specific-scan-descriptors-.patch)
  download | inline diff:
From 9feb39bc384053606879563e81c83920ab6c5568 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:23:57 -0500
Subject: [PATCH v36 02/16] Save vmbuffer in heap-specific scan descriptors for
 on-access pruning

Future commits will use the visibility map in on-access pruning to avoid
pruning when a page is all-visible, fix VM corruption, and set the VM if
the page is all-visible.

Saving the vmbuffer in the scan descriptor reduces the number of times
it would need to be pinned and unpinned, making the overhead of doing so
negligible.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam.c         | 12 +++++++++++-
 src/backend/access/heap/heapam_handler.c | 12 ++++++++++--
 src/backend/access/heap/pruneheap.c      |  6 +++++-
 src/include/access/heapam.h              | 19 ++++++++++++++++---
 4 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 649ee6e7669..54cd8d6a497 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1310,6 +1310,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1348,6 +1349,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1380,6 +1387,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3ff36f59bf8..47624194f93 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								&hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2533,7 +2541,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index eebd6cf57ea..8b5044567bf 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -214,9 +214,13 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * This function may pin *vmbuffer. It's passed by reference so the caller can
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
+ * responsible for unpinning it.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index d083f825b39..281cdd5ee59 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. The current heap block's
+	 * corresponding page in the visibility map.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +122,14 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/* Current heap block's corresponding page in the visibility map */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -418,7 +430,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v36-0003-Fix-visibility-map-corruption-in-more-cases.patch (18.6K, 4-v36-0003-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From fb34ed4466dab85fb16c948fda3773f5a590014c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v36 03/16] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 176 ++++++++++++++++++++++++---
 src/backend/access/heap/vacuumlazy.c |  89 +-------------
 src/include/access/heapam.h          |  12 ++
 3 files changed, 175 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8b5044567bf..6eca1474a2f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -121,6 +121,21 @@ typedef struct
 	 */
 	TransactionId frz_conflict_horizon;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -175,6 +190,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -182,7 +198,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -216,8 +233,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  *
  * This function may pin *vmbuffer. It's passed by reference so the caller can
- * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
- * responsible for unpinning it.
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
+ * VM corruption during pruning, we will fix it. Caller is responsible for
+ * unpinning *vmbuffer.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -284,6 +302,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -291,14 +319,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -361,6 +382,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->vmbits = visibilitymap_get_status(prstate->relation,
+											   prstate->block,
+											   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -777,6 +804,90 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Helper to fix visibility-related corruption on a heap page and its
+ * corresponding VM page. An all-visible page cannot have dead items nor can
+ * it have tuples that are not visible to all running transactions. It clears
+ * the VM corruption as well as resetting the vmbits used during pruning.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and any dead items must have been discovered under that same lock.
+ * Although we do not hold a lock on the VM buffer, it is pinned, and the heap
+ * buffer is exclusively locked, ensuring that no other backend can update the
+ * VM bits corresponding to this heap page.
+ *
+ * This function makes changes to the VM and, potentially, the heap page, but
+ * it does not need to be done in a critical section.
+ */
+static void
+heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
+{
+	const char *relname = RelationGetRelationName(prstate->relation);
+
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	if (PageIsAllVisible(prstate->page))
+	{
+		/*
+		 * It's possible for the value returned by
+		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+		 * wrong for us to see tuples that appear to not be visible to
+		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
+		 * xmin value never moves backwards, but
+		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
+		 * returns a value that's unnecessarily small, so if we see that
+		 * contradiction it just means that the tuples that we think are not
+		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
+		 * is correct.
+		 *
+		 * However, there should never be LP_DEAD items, dead tuple versions,
+		 * or tuples inserted by an in-progress transaction on a page with
+		 * PD_ALL_VISIBLE set.
+		 */
+		if (prstate->lpdead_items > 0)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("dead line pointer found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+		else
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+
+		/*
+		 * Mark the buffer dirty now in case we make no further changes and
+		 * therefore would not mark it dirty later.
+		 */
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	{
+		/*
+		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
+		 * the page-level bit is clear. However, it's possible that the bit
+		 * got cleared after heap_vac_scan_next_block() was called, so we must
+		 * recheck now that we have the buffer lock before concluding that the
+		 * VM is corrupt.
+		 */
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set"),
+				 errcontext("relation \"%s\", page %u",
+							relname, prstate->block)));
+	}
+
+	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+						VISIBILITYMAP_VALID_BITS);
+	prstate->vmbits = 0;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -837,6 +948,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -980,6 +1095,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->set_all_visible = prstate.set_all_visible;
 	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->vmbits = prstate.vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1302,7 +1418,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1312,6 +1429,13 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be redirected */
@@ -1395,6 +1519,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be marked unused */
@@ -1534,7 +1667,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1549,6 +1683,10 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_fix_vm_corruption(prstate, offnum);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1573,7 +1711,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1639,6 +1778,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 82c5b28e0ad..957322648ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -425,11 +425,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1964,81 +1959,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2070,6 +1990,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2179,18 +2100,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.set_all_visible || !(*has_lpdead_items));
 	Assert(!presult.set_all_frozen || presult.set_all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.set_all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.set_all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 281cdd5ee59..568358a060a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -258,6 +258,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -320,6 +326,12 @@ typedef struct PruneFreezeResult
 	bool		set_all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
+	 * pruning. It is cleared if VM corruption is found and corrected.
+	 */
+	uint8		vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v36-0004-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (4.5K, 5-v36-0004-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From ed9509518d8b5a0772133d13e9714153dd526858 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v36 04/16] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, we exit early.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 75 +++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6eca1474a2f..2cd684873c0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -191,6 +191,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
 static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
+static void heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -889,6 +890,68 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 	prstate->vmbits = 0;
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling heap_page_bypass_prune_freeze(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	presult->vmbits = prstate->vmbits;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (PageIsEmpty(page))
+		return;
+
+	presult->hastup = true;
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -952,6 +1015,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can exit early. Do this after fixing any
+	 * discrepancy between the page-level visibility hint and the VM.
+	 */
+	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	{
+		heap_page_bypass_prune_freeze(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
-- 
2.43.0



  [text/x-patch] v36-0005-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.4K, 6-v36-0005-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 89a76ceedd251e74742452f1b6fa57653c7219b9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v36 05/16] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page;
if it is globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid. This approach will result in examining more tuple
xmins than before; however, the additional cost should not be
significant. And doing so will enable us to set the visibility map on
access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 37 +++++++--------
 src/backend/access/heap/vacuumlazy.c        | 51 +++++++++++++--------
 src/include/access/heapam.h                 |  2 +
 4 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 75ae268d753..aee88947393 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1060,6 +1060,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2cd684873c0..f7e9fd51ac9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1035,6 +1035,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest,
+									 prstate.visibility_cutoff_xid))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1702,29 +1713,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 957322648ca..ad85e1e1738 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -461,13 +461,13 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2054,13 +2054,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
 	}
 #endif
 
@@ -2816,7 +2813,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3577,14 +3574,14 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3605,7 +3602,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3624,7 +3621,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3705,7 +3702,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3714,16 +3711,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3752,6 +3750,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 568358a060a..849ed82bcf2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -475,6 +475,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v36-0006-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (14.8K, 7-v36-0006-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From eedd45cba83b0ff220b03235bbb48af661e7dc92 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v36 06/16] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 127 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++----
 2 files changed, 65 insertions(+), 92 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f7e9fd51ac9..0de14a468f6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -136,6 +136,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -167,11 +170,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -181,7 +179,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /* Local functions */
@@ -442,53 +439,35 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Other callers must initialize prstate.set_all_frozen to false,
+	 * since we will not call heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * We only consider opportunistic freezing if the page would become
+	 * all-frozen, or if it would be all-frozen except for dead tuples that
+	 * VACUUM will remove.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear
+	 * set_all_visible and set_all_frozen when we encounter LP_DEAD items.
+	 * Instead, we correct them after deciding whether to freeze, but before
+	 * updating the VM, to avoid setting the VM bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze ? true : false;
 }
 
 /*
@@ -718,7 +697,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -973,9 +951,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1041,9 +1018,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidMaybeRunning(prstate.vistest,
-									 prstate.visibility_cutoff_xid))
+									 prstate.newest_live_xid))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	/*
@@ -1194,7 +1171,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1654,6 +1631,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1701,32 +1679,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ad85e1e1738..23402e7e26c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,7 +463,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -471,7 +471,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2789,7 +2789,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2815,14 +2815,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2863,7 +2863,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3576,7 +3576,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 
@@ -3584,7 +3584,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3607,7 +3607,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3625,7 +3625,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3635,7 +3635,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3724,9 +3724,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3756,8 +3756,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *newest_live_xid))
 	{
 		all_visible = false;
 		*all_frozen = false;
-- 
2.43.0



  [text/x-patch] v36-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (27.1K, 8-v36-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From a96825f9026c4e9d8c8f55633b0e6dcf6f83c156 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v36 07/16] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 325 ++++++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c | 107 +--------
 src/include/access/heapam.h          |  38 ++--
 3 files changed, 271 insertions(+), 199 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0de14a468f6..ec58f717c0b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -129,12 +144,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
-	uint8		vmbits;
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -164,21 +183,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /* Local functions */
@@ -216,6 +220,12 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  uint8 old_vmbits, uint8 new_vmbits,
+									  TransactionId latest_xid_removed,
+									  TransactionId newest_frozen_xid,
+									  TransactionId newest_live_xid);
 
 
 /*
@@ -382,9 +392,10 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
-	prstate->vmbits = visibilitymap_get_status(prstate->relation,
-											   prstate->block,
-											   &prstate->vmbuffer);
+	prstate->new_vmbits = 0;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -783,6 +794,66 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 uint8 old_vmbits, uint8 new_vmbits,
+				 TransactionId latest_xid_removed,
+				 TransactionId newest_frozen_xid,
+				 TransactionId newest_live_xid)
+{
+	TransactionId conflict_xid = InvalidTransactionId;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be seen as frozen by all MVCC snapshots on the standby (any
+	 * conflict would have been handled in reaction to the WAL record freezing
+	 * those tuples).
+	 */
+	if (!do_prune &&
+		!do_freeze &&
+		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) &&
+		(new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshot conflict horizon for the whole record should be the most
+	 * conservative (newest) of all the horizons calculated for any of the
+	 * possible modifications. If this record will prune tuples, any queries
+	 * on the standby with xmin older than the youngest XID of the most
+	 * recently removed tuple this record will prune will conflict.  If this
+	 * record will freeze tuples, any queries on the standby with xmin older
+	 * than the youngest tuple this record will freeze will conflict.
+	 *
+	 * If we are setting the VM, the conflict horizon is almost always the
+	 * newest live XID, except in the situation described above.
+	 *
+	 * By picking the newest of all of those, we can ensure that all changes
+	 * in the record have been taken into account.
+	 */
+	if (do_set_vm)
+		conflict_xid = newest_live_xid;
+	if (do_freeze && TransactionIdFollows(newest_frozen_xid, conflict_xid))
+		conflict_xid = newest_frozen_xid;
+
+	/*
+	 * If we are removing tuples with a younger XID than our so far calculated
+	 * conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+	{
+		Assert(do_prune);
+		conflict_xid = latest_xid_removed;
+	}
+
+	return conflict_xid;
+}
+
 /*
  * Helper to fix visibility-related corruption on a heap page and its
  * corresponding VM page. An all-visible page cannot have dead items nor can
@@ -847,7 +918,7 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 		PageClearAllVisible(prstate->page);
 		MarkBufferDirtyHint(prstate->buffer, true);
 	}
-	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	else if (prstate->old_vmbits & VISIBILITYMAP_VALID_BITS)
 	{
 		/*
 		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
@@ -865,7 +936,43 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 
 	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
 						VISIBILITYMAP_VALID_BITS);
-	prstate->vmbits = 0;
+	prstate->old_vmbits = 0;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * consider setting the VM.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
 }
 
 /*
@@ -894,15 +1001,13 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
 	Page		page = prstate->page;
 
-	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
 			!prstate->attempt_freeze));
 
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->vmbits = prstate->vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -932,7 +1037,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -947,12 +1053,10 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also is passed and if the page
+ * is found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -980,15 +1084,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
-	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
@@ -997,8 +1103,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * is not being attempted, we can exit early. Do this after fixing any
 	 * discrepancy between the page-level visibility hint and the VM.
 	 */
-	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	if (prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
 	{
 		heap_page_bypass_prune_freeze(&prstate, presult);
 		return;
@@ -1068,6 +1174,25 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									prstate.old_vmbits, prstate.new_vmbits,
+									prstate.latest_xid_removed,
+									prstate.pagefrz.FreezePageConflictXid,
+									prstate.newest_live_xid);
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1089,14 +1214,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1110,6 +1238,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1117,29 +1266,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * queries on the standby older than the youngest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the youngest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1149,33 +1281,70 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->vmbits = prstate.vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->new_all_visible_pages = 1;
+			if (prstate.set_all_frozen)
+				presult->new_all_visible_frozen_pages = 1;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->new_all_frozen_pages = 1;
+	}
+	else
+	{
+		presult->new_all_visible_pages = 0;
+		presult->new_all_frozen_pages = 0;
+		presult->new_all_visible_frozen_pages = 0;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 23402e7e26c..6b5210d6393 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -459,13 +459,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
@@ -1996,8 +1989,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2038,29 +2029,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2081,6 +2049,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	vacrel->new_all_visible_pages += presult.new_all_visible_pages;
+	vacrel->new_all_visible_all_frozen_pages += presult.new_all_visible_frozen_pages;
+	vacrel->new_all_frozen_pages += presult.new_all_frozen_pages;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.new_all_visible_frozen_pages > 0 ||
+		presult.new_all_frozen_pages > 0;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2094,71 +2070,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3572,7 +3483,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 849ed82bcf2..7ef4cbbfb1e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -260,7 +260,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -276,8 +277,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -311,26 +311,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
-	 * pruning. It is cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		vmbits;
+	BlockNumber new_all_visible_pages;
+	BlockNumber new_all_visible_frozen_pages;
+	BlockNumber new_all_frozen_pages;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -467,7 +453,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
-
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v36-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 9-v36-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 89fde835f59045ae7490cbbdcfc461bef5c24841 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v36 08/16] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6b5210d6393..1451c943644 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1903,9 +1903,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1923,13 +1926,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v36-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.0K, 10-v36-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From d9acfde0775edefb463df2b373b24cafdd8ba531 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v36 09/16] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 110 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 372 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 1a9e7bea5d2..a7005b57e61 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_freeze() for
+	 * more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 54cd8d6a497..149cffd1a57 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8890,50 +8890,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 6d39a5fff7c..a83f6b03d69 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1367,9 +1230,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ec58f717c0b..184d7e98064 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1255,8 +1255,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1451c943644..8bd178ae7e6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1926,11 +1926,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2794,9 +2794,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 3047bd46def..fc74e39e069 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,112 +219,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -341,9 +239,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..66ed51a8aa1 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -477,12 +477,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index a0166c5b410..001afb037f3 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3250564d4ff..3bbbdc62743 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4357,7 +4357,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v36-0010-Initialize-missing-fields-in-CreateExecutorState.patch (1.0K, 11-v36-0010-Initialize-missing-fields-in-CreateExecutorState.patch)
  download | inline diff:
From 81e2ffc119e0409e40da26b5ad6cd145eecc6ac3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sun, 1 Mar 2026 16:48:19 -0500
Subject: [PATCH v36 10/16] Initialize missing fields in CreateExecutorState()

d47cbf474ecbd449a4 forgot to initialize a few fields it introduced in
the EState, so do that now.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/executor/execUtils.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index a7955e476f9..cd4d5452cfb 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,9 @@ CreateExecutorState(void)
 	estate->es_rteperminfos = NIL;
 	estate->es_plannedstmt = NULL;
 	estate->es_part_prune_infos = NIL;
+	estate->es_part_prune_states = NIL;
+	estate->es_part_prune_results = NIL;
+	estate->es_unpruned_relids = NULL;
 
 	estate->es_junkFilter = NULL;
 
-- 
2.43.0



  [text/x-patch] v36-0011-Track-which-relations-are-modified-by-a-query.patch (5.5K, 12-v36-0011-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 9f7098111d8520a955b0b4d6d4d62c4a79a5497c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v36 11/16] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c  | 18 ++++++++++++++++++
 src/backend/executor/execUtils.c | 31 +++++++++++++++++++++++++++++++
 src/include/executor/executor.h  |  3 +++
 src/include/nodes/execnodes.h    |  6 ++++++
 4 files changed, 58 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bfd3ebc601e..57dcdeda056 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -920,6 +920,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation may be modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
@@ -990,6 +994,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
@@ -3033,6 +3041,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_output_cid = parentestate->es_output_cid;
 	rcestate->es_queryEnv = parentestate->es_queryEnv;
 
+	/*
+	 * Use a deep copy to avoid stale pointers since bms_add_member() may
+	 * reallocate the bitmap.
+	 */
+	rcestate->es_modified_relids = bms_copy(parentestate->es_modified_relids);
+
 	/*
 	 * ResultRelInfos needed by subplans are initialized from scratch when the
 	 * subplans themselves are initialized.
@@ -3165,6 +3179,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 */
 	epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(rcestate);
+#endif
+
 	MemoryContextSwitchTo(oldcontext);
 }
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cd4d5452cfb..0f8364b8720 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -123,6 +123,8 @@ CreateExecutorState(void)
 	estate->es_part_prune_results = NIL;
 	estate->es_unpruned_relids = NULL;
 
+	estate->es_modified_relids = NULL;
+
 	estate->es_junkFilter = NULL;
 
 	estate->es_output_cid = (CommandId) 0;
@@ -871,6 +873,33 @@ ExecGetRangeTableRelation(EState *estate, Index rti, bool isResultRel)
 	return rel;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Assert that es_modified_relids includes all potentially modified RT
+ * indexes.
+ */
+void
+CrossCheckModifiedRelids(EState *estate)
+{
+	Bitmapset  *expected = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = lfirst_node(ResultRelInfo, lc);
+
+		expected = bms_add_member(expected, rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (Index rti = 1; rti <= estate->es_range_table_size; rti++)
+			if (estate->es_rowmarks[rti - 1] != NULL)
+				expected = bms_add_member(expected, rti);
+	}
+	Assert(bms_is_subset(expected, estate->es_modified_relids));
+}
+#endif
+
 /*
  * ExecInitResultRelation
  *		Open relation given by the passed-in RT index and fill its
@@ -896,6 +925,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..05f032baeaa 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -703,6 +703,9 @@ extern Relation ExecGetRangeTableRelation(EState *estate, Index rti,
 										  bool isResultRel);
 extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 								   Index rti);
+#ifdef USE_ASSERT_CHECKING
+extern void CrossCheckModifiedRelids(EState *estate);
+#endif
 
 extern int	executor_errposition(EState *estate, int location);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63c067d5aae..610385df12b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -679,6 +679,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query through a
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v36-0012-Thread-flags-through-begin-scan-APIs.patch (21.5K, 13-v36-0012-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 29cef07ed9ec1858fd81957f2b7a8b422ec81969 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:17 -0500
Subject: [PATCH v36 12/16] Thread flags through begin-scan APIs

Add a flags parameter to the index_fetch_begin() table AM callback and
the begin-scan helpers so the executor can pass context for building
scan descriptors. This introduces an extension point for follow-up work
to mark relations as read-only for the current query, without changing
behavior in this patch.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  6 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 ++++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 ++++----
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 ++++----
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      |  4 ++--
 src/backend/executor/nodeSeqscan.c        |  6 +++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/tableam.h              | 17 +++++++++--------
 22 files changed, 50 insertions(+), 46 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 146ee97a47d..de835604cbd 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2845,7 +2845,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index c7e38dbe193..d48c85e895c 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2061,7 +2061,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 47624194f93..ebe2e87a28b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,7 +79,7 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
@@ -761,7 +761,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -770,7 +770,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1fe7ffb2487 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +716,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 43f64a0e721..1827208396c 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 69ef1527e06..bc4eedba4ac 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1927,7 +1927,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..b3aeee36ce6 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d6ef7275a64..900199dbe29 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1159,7 +1159,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 85242dcc245..09796fa4307 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6388,7 +6388,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13765,7 +13765,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22667,7 +22667,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23131,7 +23131,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cb3e4f67ea1 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..5b8ca1abf62 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -383,7 +383,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +602,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +666,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index e0b6df64767..b3b6da3d7e4 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -108,7 +108,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c2d09374517..cf4d9a4f832 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a616abff04c..a7af2f6628a 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -111,7 +111,7 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index af3c788ce8b..d9d7ec0516a 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -374,7 +374,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -407,5 +407,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d4da0e8dea9..5b2165c267d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7161,7 +7161,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 4c0429cc613..3934fa44793 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,7 +156,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..e881e4f82a0 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -420,7 +420,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -894,9 +894,9 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
@@ -939,9 +939,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
 }
@@ -1139,7 +1139,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1175,7 +1176,7 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
@@ -1185,7 +1186,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v36-0013-Pass-down-information-on-table-modification-to-s.patch (8.0K, 14-v36-0013-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 099ea3c46847196eba132344ce861f9d74b01be0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v36 13/16] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, and bitmap
table scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/access/heap/heapam_handler.c  |  1 +
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  9 +++++++-
 src/backend/executor/nodeIndexscan.c      | 18 ++++++++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              |  3 +++
 7 files changed, 65 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index ebe2e87a28b..3a8eb9d8b61 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -86,6 +86,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index b3b6da3d7e4..9bcf9a68183 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -104,11 +104,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index cf4d9a4f832..2fe724a323f 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -84,6 +84,12 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -94,7 +100,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys, 0);
+								   node->ioss_NumOrderByKeys,
+								   flags);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a7af2f6628a..8730dab7469 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -198,6 +205,12 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -207,7 +220,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index d9d7ec0516a..336354922a2 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 7ef4cbbfb1e..c20218f8190 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -130,6 +130,12 @@ typedef struct IndexFetchHeapData
 
 	/* Current heap block's corresponding page in the visibility map */
 	Buffer		xs_vmbuffer;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e881e4f82a0..51dfd122307 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
-- 
2.43.0



  [text/x-patch] v36-0014-Allow-on-access-pruning-to-set-pages-all-visible.patch (9.9K, 15-v36-0014-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From f597b757c55fee445b5f7f08d5cde55a38e197ff Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v36 14/16] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              |  3 +-
 src/backend/access/heap/heapam_handler.c      |  6 ++-
 src/backend/access/heap/pruneheap.c           | 41 +++++++++++++++----
 src/backend/access/heap/vacuumlazy.c          |  2 +-
 src/include/access/heapam.h                   | 12 ++++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 6 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 149cffd1a57..8273414b430 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3a8eb9d8b61..673f6599613 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -147,7 +147,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								!hscan->modifies_base_rel);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2542,7 +2543,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 184d7e98064..064264af1e1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -220,7 +222,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
 									  uint8 old_vmbits, uint8 new_vmbits,
 									  TransactionId latest_xid_removed,
@@ -246,7 +249,8 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -328,6 +332,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = 0;
+			if (rel_read_only)
+				params.options = HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -384,6 +390,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -946,21 +953,37 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * consider setting the VM.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1176,7 +1199,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8bd178ae7e6..d2cae77b52a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2008,7 +2008,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c20218f8190..0a3e3df9b2d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -96,7 +97,8 @@ typedef struct HeapScanDescData
 
 	/*
 	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
+	 * corresponding page in the visibility map. If the relation is not
+	 * modified by the query, on-access pruning may set the VM.
 	 */
 	Buffer		rs_vmbuffer;
 
@@ -128,7 +130,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 
 	/*
@@ -435,7 +441,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v36-0015-Avoid-BufferGetPage-calls-in-heap_update.patch (5.6K, 16-v36-0015-Avoid-BufferGetPage-calls-in-heap_update.patch)
  download | inline diff:
From bc15d26cb8ee817131e49cdc8f34eee9c1fb7cdc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 6 Mar 2026 16:46:01 -0500
Subject: [PATCH v36 15/16] Avoid BufferGetPage() calls in heap_update()

BufferGetPage() isn't cheap and heap_update() calls it multiple times
when it could just save the page from a single call. Do that.
While we are at it, make separate variables for old and new page in
heap_xlog_update(). It's confusing to reuse "page" for both pages.
---
 src/backend/access/heap/heapam.c      | 17 ++++++++------
 src/backend/access/heap/heapam_xlog.c | 34 ++++++++++++++-------------
 2 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8273414b430..c39af2137c2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3339,7 +3339,8 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
 	HeapTuple	heaptup;
 	HeapTuple	old_key_tuple = NULL;
 	bool		old_key_copied = false;
-	Page		page;
+	Page		page,
+				newpage;
 	BlockNumber block;
 	MultiXactStatus mxact_status;
 	Buffer		buffer,
@@ -4065,6 +4066,8 @@ l2:
 		heaptup = newtup;
 	}
 
+	newpage = BufferGetPage(newbuf);
+
 	/*
 	 * We're about to do the actual update -- check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -4179,17 +4182,17 @@ l2:
 	oldtup.t_data->t_ctid = heaptup->t_self;
 
 	/* clear PD_ALL_VISIBLE flags, reset all visibilitymap bits */
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
-	if (newbuf != buffer && PageIsAllVisible(BufferGetPage(newbuf)))
+	if (newbuf != buffer && PageIsAllVisible(newpage))
 	{
 		all_visible_cleared_new = true;
-		PageClearAllVisible(BufferGetPage(newbuf));
+		PageClearAllVisible(newpage);
 		visibilitymap_clear(relation, BufferGetBlockNumber(newbuf),
 							vmbuffer_new, VISIBILITYMAP_VALID_BITS);
 	}
@@ -4220,9 +4223,9 @@ l2:
 								 all_visible_cleared_new);
 		if (newbuf != buffer)
 		{
-			PageSetLSN(BufferGetPage(newbuf), recptr);
+			PageSetLSN(newpage, recptr);
 		}
-		PageSetLSN(BufferGetPage(buffer), recptr);
+		PageSetLSN(page, recptr);
 	}
 
 	END_CRIT_SECTION();
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a83f6b03d69..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -685,7 +685,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 	ItemPointerData newtid;
 	Buffer		obuffer,
 				nbuffer;
-	Page		page;
+	Page		opage,
+				npage;
 	OffsetNumber offnum;
 	ItemId		lp;
 	HeapTupleData oldtup;
@@ -749,15 +750,15 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 									  &obuffer);
 	if (oldaction == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(obuffer);
+		opage = BufferGetPage(obuffer);
 		offnum = xlrec->old_offnum;
-		if (offnum < 1 || offnum > PageGetMaxOffsetNumber(page))
+		if (offnum < 1 || offnum > PageGetMaxOffsetNumber(opage))
 			elog(PANIC, "offnum out of range");
-		lp = PageGetItemId(page, offnum);
+		lp = PageGetItemId(opage, offnum);
 		if (!ItemIdIsNormal(lp))
 			elog(PANIC, "invalid lp");
 
-		htup = (HeapTupleHeader) PageGetItem(page, lp);
+		htup = (HeapTupleHeader) PageGetItem(opage, lp);
 
 		oldtup.t_data = htup;
 		oldtup.t_len = ItemIdGetLength(lp);
@@ -776,12 +777,12 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		htup->t_ctid = newtid;
 
 		/* Mark the page as a candidate for pruning */
-		PageSetPrunable(page, XLogRecGetXid(record));
+		PageSetPrunable(opage, XLogRecGetXid(record));
 
 		if (xlrec->flags & XLH_UPDATE_OLD_ALL_VISIBLE_CLEARED)
-			PageClearAllVisible(page);
+			PageClearAllVisible(opage);
 
-		PageSetLSN(page, lsn);
+		PageSetLSN(opage, lsn);
 		MarkBufferDirty(obuffer);
 	}
 
@@ -796,8 +797,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 	else if (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE)
 	{
 		nbuffer = XLogInitBufferForRedo(record, 0);
-		page = BufferGetPage(nbuffer);
-		PageInit(page, BufferGetPageSize(nbuffer), 0);
+		npage = BufferGetPage(nbuffer);
+		PageInit(npage, BufferGetPageSize(nbuffer), 0);
 		newaction = BLK_NEEDS_REDO;
 	}
 	else
@@ -829,10 +830,10 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		recdata = XLogRecGetBlockData(record, 0, &datalen);
 		recdata_end = recdata + datalen;
 
-		page = BufferGetPage(nbuffer);
+		npage = BufferGetPage(nbuffer);
 
 		offnum = xlrec->new_offnum;
-		if (PageGetMaxOffsetNumber(page) + 1 < offnum)
+		if (PageGetMaxOffsetNumber(npage) + 1 < offnum)
 			elog(PANIC, "invalid max offset number");
 
 		if (xlrec->flags & XLH_UPDATE_PREFIX_FROM_OLD)
@@ -909,16 +910,17 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		/* Make sure there is no forward chain link in t_ctid */
 		htup->t_ctid = newtid;
 
-		offnum = PageAddItem(page, htup, newlen, offnum, true, true);
+		offnum = PageAddItem(npage, htup, newlen, offnum, true, true);
 		if (offnum == InvalidOffsetNumber)
 			elog(PANIC, "failed to add tuple");
 
 		if (xlrec->flags & XLH_UPDATE_NEW_ALL_VISIBLE_CLEARED)
-			PageClearAllVisible(page);
+			PageClearAllVisible(npage);
 
-		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+		/* needed to update FSM below */
+		freespace = PageGetHeapFreeSpace(npage);
 
-		PageSetLSN(page, lsn);
+		PageSetLSN(npage, lsn);
 		MarkBufferDirty(nbuffer);
 	}
 
-- 
2.43.0



  [text/x-patch] v36-0016-Set-pd_prune_xid-on-insert.patch (10.4K, 17-v36-0016-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 75cc5440779451b5ce177d5cd884c6f1f3109075 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v36 16/16] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to run and set the VM
all-visible after a page is filled with newly inserted tuples the first
time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 40 ++++++++++++-------
 src/backend/access/heap/heapam_xlog.c         | 19 ++++++++-
 src/backend/access/heap/pruneheap.c           | 14 +++----
 .../modules/index/expected/killtuples.out     |  8 ++--
 4 files changed, 54 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c39af2137c2..0b8313de2e7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 064264af1e1..0776cb6cfc2 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1920,16 +1920,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_fix_vm_corruption(prstate, offnum);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..700144d6783 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -329,7 +329,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-11 17:01  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-03-11 17:01 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Fri, Mar 6, 2026 at 6:33 PM Melanie Plageman
<[email protected]> wrote:
>
> Thanks for the review! Attached is v36. I've pushed some of the early
> patches in the set and this is what is left.

I've gone ahead and pushed another of the introductory commits.
Attached v37 has the remaining patches.

The one change is that I've removed get_conflict_xid(). I determined
that in the current code that we cannot end up in the scenario where
we didn't prune or freeze and the page was already all-visible but not
all-frozen. The closest scenario would be one where the page was
all-frozen, we cleared the all-frozen bit because we did a SELECT FOR
UPDATE on one of the tuples, then vacuum freezes the page. Even though
we are just invalidating the xmax, it still counts as freezing.
However, in this case we will not advance the FreezePageConflictXid,
so the snapshot conflict horizon will still correctly be
InvalidTransactionId. I believe in all cases we will correctly set the
conflict horizon to InvalidTransactionId with this much simpler
conflict xid calculation:

    conflict_xid = InvalidTransactionId;
    if (do_set_vm)
        conflict_xid = prstate.newest_live_xid;
    if (do_freeze &&
TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
conflict_xid))
        conflict_xid = prstate.pagefrz.FreezePageConflictXid;
    if (do_prune && TransactionIdFollows(prstate.latest_xid_removed,
conflict_xid))
        conflict_xid = prstate.latest_xid_removed;

The only outstanding question I have is about pd_prune_xid on insert:

I think we only want to set pd_prune_xid on insert if the transaction
ID is normal. Bootstrap mode does call heap_insert(), so we need to
check the xid before setting it. The only question is then if we want
the same guard on replay. Bootstrap mode won't actually insert a WAL
record, so we don't need this check I think. However, I think it is
better to have it for consistency with normal mode.

- Melanie


Attachments:

  [text/x-patch] v37-0001-Save-vmbuffer-in-heap-specific-scan-descriptors-.patch (6.2K, 2-v37-0001-Save-vmbuffer-in-heap-specific-scan-descriptors-.patch)
  download | inline diff:
From 399b94b6cdcadd95d018f51c97bbbf6e6bd26f7d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:23:57 -0500
Subject: [PATCH v37 01/15] Save vmbuffer in heap-specific scan descriptors for
 on-access pruning

Future commits will use the visibility map in on-access pruning to avoid
pruning when a page is all-visible, fix VM corruption, and set the VM if
the page is all-visible.

Saving the vmbuffer in the scan descriptor reduces the number of times
it would need to be pinned and unpinned, making the overhead of doing so
negligible.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam.c         | 12 +++++++++++-
 src/backend/access/heap/heapam_handler.c | 12 ++++++++++--
 src/backend/access/heap/pruneheap.c      |  6 +++++-
 src/include/access/heapam.h              | 19 ++++++++++++++++---
 4 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8f1c11a9350..7ff9a930844 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1310,6 +1310,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1348,6 +1349,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1380,6 +1387,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 5137d2510ea..b6ed5938477 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								&hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2533,7 +2541,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6beeb6956e3..8d9f0694206 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -207,9 +207,13 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * This function may pin *vmbuffer. It's passed by reference so the caller can
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
+ * responsible for unpinning it.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ad993c07311..2fdc50b865b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. The current heap block's
+	 * corresponding page in the visibility map.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +122,14 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/* Current heap block's corresponding page in the visibility map */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -422,7 +434,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v37-0002-Fix-visibility-map-corruption-in-more-cases.patch (18.6K, 3-v37-0002-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From 421fdc75faa283d435f4a1a3da7f322be0a8e0f4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v37 02/15] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 176 ++++++++++++++++++++++++---
 src/backend/access/heap/vacuumlazy.c |  89 +-------------
 src/include/access/heapam.h          |  12 ++
 3 files changed, 175 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8d9f0694206..2a0d54136b6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -114,6 +114,21 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -168,6 +183,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -175,7 +191,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -209,8 +226,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  *
  * This function may pin *vmbuffer. It's passed by reference so the caller can
- * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
- * responsible for unpinning it.
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
+ * VM corruption during pruning, we will fix it. Caller is responsible for
+ * unpinning *vmbuffer.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -277,6 +295,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -284,14 +312,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -354,6 +375,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->vmbits = visibilitymap_get_status(prstate->relation,
+											   prstate->block,
+											   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -770,6 +797,90 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Helper to fix visibility-related corruption on a heap page and its
+ * corresponding VM page. An all-visible page cannot have dead items nor can
+ * it have tuples that are not visible to all running transactions. It clears
+ * the VM corruption as well as resetting the vmbits used during pruning.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and any dead items must have been discovered under that same lock.
+ * Although we do not hold a lock on the VM buffer, it is pinned, and the heap
+ * buffer is exclusively locked, ensuring that no other backend can update the
+ * VM bits corresponding to this heap page.
+ *
+ * This function makes changes to the VM and, potentially, the heap page, but
+ * it does not need to be done in a critical section.
+ */
+static void
+heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
+{
+	const char *relname = RelationGetRelationName(prstate->relation);
+
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	if (PageIsAllVisible(prstate->page))
+	{
+		/*
+		 * It's possible for the value returned by
+		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+		 * wrong for us to see tuples that appear to not be visible to
+		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
+		 * xmin value never moves backwards, but
+		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
+		 * returns a value that's unnecessarily small, so if we see that
+		 * contradiction it just means that the tuples that we think are not
+		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
+		 * is correct.
+		 *
+		 * However, there should never be LP_DEAD items, dead tuple versions,
+		 * or tuples inserted by an in-progress transaction on a page with
+		 * PD_ALL_VISIBLE set.
+		 */
+		if (prstate->lpdead_items > 0)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("dead line pointer found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+		else
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+
+		/*
+		 * Mark the buffer dirty now in case we make no further changes and
+		 * therefore would not mark it dirty later.
+		 */
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	{
+		/*
+		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
+		 * the page-level bit is clear. However, it's possible that the bit
+		 * got cleared after heap_vac_scan_next_block() was called, so we must
+		 * recheck now that we have the buffer lock before concluding that the
+		 * VM is corrupt.
+		 */
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set"),
+				 errcontext("relation \"%s\", page %u",
+							relname, prstate->block)));
+	}
+
+	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+						VISIBILITYMAP_VALID_BITS);
+	prstate->vmbits = 0;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -830,6 +941,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -973,6 +1088,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->set_all_visible = prstate.set_all_visible;
 	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->vmbits = prstate.vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1295,7 +1411,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1305,6 +1422,13 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be redirected */
@@ -1388,6 +1512,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be marked unused */
@@ -1527,7 +1660,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1542,6 +1676,10 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_fix_vm_corruption(prstate, offnum);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1566,7 +1704,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1632,6 +1771,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 82c5b28e0ad..957322648ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -425,11 +425,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1964,81 +1959,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2070,6 +1990,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2179,18 +2100,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.set_all_visible || !(*has_lpdead_items));
 	Assert(!presult.set_all_frozen || presult.set_all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.set_all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.set_all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2fdc50b865b..c649e5f1980 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -262,6 +262,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -324,6 +330,12 @@ typedef struct PruneFreezeResult
 	bool		set_all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
+	 * pruning. It is cleared if VM corruption is found and corrected.
+	 */
+	uint8		vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v37-0003-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (4.5K, 4-v37-0003-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From 1acfb16425bc9adafde80c46ffd97c95b8a79571 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v37 03/15] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, we exit early.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 75 +++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2a0d54136b6..b35ebdc134d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -184,6 +184,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
 static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
+static void heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -882,6 +883,68 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 	prstate->vmbits = 0;
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling heap_page_bypass_prune_freeze(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	presult->vmbits = prstate->vmbits;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (PageIsEmpty(page))
+		return;
+
+	presult->hastup = true;
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -945,6 +1008,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can exit early. Do this after fixing any
+	 * discrepancy between the page-level visibility hint and the VM.
+	 */
+	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	{
+		heap_page_bypass_prune_freeze(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
-- 
2.43.0



  [text/x-patch] v37-0004-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.4K, 5-v37-0004-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 6696f6cfa18216ade8943cc27e2c46a1ccc55e2b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v37 04/15] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page;
if it is globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid. This approach will result in examining more tuple
xmins than before; however, the additional cost should not be
significant. And doing so will enable us to set the visibility map on
access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 37 +++++++--------
 src/backend/access/heap/vacuumlazy.c        | 51 +++++++++++++--------
 src/include/access/heapam.h                 |  2 +
 4 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..d70fab3a763 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b35ebdc134d..c5e036053d3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1028,6 +1028,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest,
+									 prstate.visibility_cutoff_xid))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1695,29 +1706,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 957322648ca..ad85e1e1738 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -461,13 +461,13 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2054,13 +2054,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
 	}
 #endif
 
@@ -2816,7 +2813,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3577,14 +3574,14 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3605,7 +3602,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3624,7 +3621,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3705,7 +3702,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3714,16 +3711,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3752,6 +3750,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c649e5f1980..bbb223dd0d2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -479,6 +479,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v37-0005-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (14.8K, 6-v37-0005-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From ba68efa89610e45a153591af875b9215bca0e7c7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v37 05/15] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 127 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++----
 2 files changed, 65 insertions(+), 92 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c5e036053d3..d9a06f3115c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -160,11 +163,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -174,7 +172,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /* Local functions */
@@ -435,53 +432,35 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Other callers must initialize prstate.set_all_frozen to false,
+	 * since we will not call heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * We only consider opportunistic freezing if the page would become
+	 * all-frozen, or if it would be all-frozen except for dead tuples that
+	 * VACUUM will remove.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear
+	 * set_all_visible and set_all_frozen when we encounter LP_DEAD items.
+	 * Instead, we correct them after deciding whether to freeze, but before
+	 * updating the VM, to avoid setting the VM bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze ? true : false;
 }
 
 /*
@@ -711,7 +690,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -966,9 +944,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1034,9 +1011,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidMaybeRunning(prstate.vistest,
-									 prstate.visibility_cutoff_xid))
+									 prstate.newest_live_xid))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	/*
@@ -1187,7 +1164,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1647,6 +1624,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1694,32 +1672,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ad85e1e1738..23402e7e26c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,7 +463,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -471,7 +471,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2789,7 +2789,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2815,14 +2815,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2863,7 +2863,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3576,7 +3576,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 
@@ -3584,7 +3584,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3607,7 +3607,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3625,7 +3625,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3635,7 +3635,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3724,9 +3724,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3756,8 +3756,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *newest_live_xid))
 	{
 		all_visible = false;
 		*all_frozen = false;
-- 
2.43.0



  [text/x-patch] v37-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (24.7K, 7-v37-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From b8e7bcf3ad1132b58c7f045465ed61da3a027475 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v37 06/15] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 263 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 107 +----------
 src/include/access/heapam.h          |  38 ++--
 3 files changed, 208 insertions(+), 200 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d9a06f3115c..479892b0808 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
-	uint8		vmbits;
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -157,21 +176,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /* Local functions */
@@ -209,7 +213,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -375,9 +379,10 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
-	prstate->vmbits = visibilitymap_get_status(prstate->relation,
-											   prstate->block,
-											   &prstate->vmbuffer);
+	prstate->new_vmbits = 0;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -840,7 +845,7 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 		PageClearAllVisible(prstate->page);
 		MarkBufferDirtyHint(prstate->buffer, true);
 	}
-	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	else if (prstate->old_vmbits & VISIBILITYMAP_VALID_BITS)
 	{
 		/*
 		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
@@ -858,7 +863,43 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 
 	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
 						VISIBILITYMAP_VALID_BITS);
-	prstate->vmbits = 0;
+	prstate->old_vmbits = 0;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * consider setting the VM.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
 }
 
 /*
@@ -887,15 +928,13 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
 	Page		page = prstate->page;
 
-	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
 			!prstate->attempt_freeze));
 
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->vmbits = prstate->vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -925,7 +964,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -940,12 +980,10 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also is passed and if the page
+ * is found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -973,15 +1011,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
-	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
@@ -990,8 +1030,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * is not being attempted, we can exit early. Do this after fixing any
 	 * discrepancy between the page-level visibility hint and the VM.
 	 */
-	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	if (prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
 	{
 		heap_page_bypass_prune_freeze(&prstate, presult);
 		return;
@@ -1061,6 +1101,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1082,14 +1146,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1103,6 +1170,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1110,29 +1198,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1142,33 +1213,67 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->vmbits = prstate.vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->new_all_visible_pages = 0;
+	presult->new_all_frozen_pages = 0;
+	presult->new_all_visible_frozen_pages = 0;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->new_all_visible_pages = 1;
+			if (prstate.set_all_frozen)
+				presult->new_all_visible_frozen_pages = 1;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->new_all_frozen_pages = 1;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 23402e7e26c..6b5210d6393 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -459,13 +459,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
@@ -1996,8 +1989,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2038,29 +2029,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2081,6 +2049,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	vacrel->new_all_visible_pages += presult.new_all_visible_pages;
+	vacrel->new_all_visible_all_frozen_pages += presult.new_all_visible_frozen_pages;
+	vacrel->new_all_frozen_pages += presult.new_all_frozen_pages;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.new_all_visible_frozen_pages > 0 ||
+		presult.new_all_frozen_pages > 0;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2094,71 +2070,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3572,7 +3483,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bbb223dd0d2..f77a00291bb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -264,7 +264,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -280,8 +281,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -315,26 +315,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
-	 * pruning. It is cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		vmbits;
+	BlockNumber new_all_visible_pages;
+	BlockNumber new_all_visible_frozen_pages;
+	BlockNumber new_all_frozen_pages;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -471,7 +457,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
-
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v37-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 8-v37-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 00c89d9283d8fcbdfc8f309a3903ffcacad7b11e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v37 07/15] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6b5210d6393..1451c943644 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1903,9 +1903,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1923,13 +1926,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v37-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.0K, 9-v37-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 6c344aca95cf22851c300c96509a312a58b19e2d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v37 08/15] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 110 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 372 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 1a9e7bea5d2..a7005b57e61 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_freeze() for
+	 * more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7ff9a930844..0d6e3bc7884 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8883,50 +8883,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 6d39a5fff7c..a83f6b03d69 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1367,9 +1230,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 479892b0808..94be0348509 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1187,8 +1187,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1451c943644..8bd178ae7e6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1926,11 +1926,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2794,9 +2794,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..f1da52b2069 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -222,112 +221,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..66ed51a8aa1 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -477,12 +477,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3da19d41413..44948d6d611 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4360,7 +4360,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v37-0009-Initialize-missing-fields-in-CreateExecutorState.patch (1.0K, 10-v37-0009-Initialize-missing-fields-in-CreateExecutorState.patch)
  download | inline diff:
From 52ca0331db0cdf58672562a912de9423217adab9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sun, 1 Mar 2026 16:48:19 -0500
Subject: [PATCH v37 09/15] Initialize missing fields in CreateExecutorState()

d47cbf474ecbd449a4 forgot to initialize a few fields it introduced in
the EState, so do that now.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/executor/execUtils.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index a7955e476f9..cd4d5452cfb 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,9 @@ CreateExecutorState(void)
 	estate->es_rteperminfos = NIL;
 	estate->es_plannedstmt = NULL;
 	estate->es_part_prune_infos = NIL;
+	estate->es_part_prune_states = NIL;
+	estate->es_part_prune_results = NIL;
+	estate->es_unpruned_relids = NULL;
 
 	estate->es_junkFilter = NULL;
 
-- 
2.43.0



  [text/x-patch] v37-0010-Track-which-relations-are-modified-by-a-query.patch (5.5K, 11-v37-0010-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 8acf2bbb878ce445a061e0ab18edcd6b66099e55 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v37 10/15] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c  | 18 ++++++++++++++++++
 src/backend/executor/execUtils.c | 31 +++++++++++++++++++++++++++++++
 src/include/executor/executor.h  |  3 +++
 src/include/nodes/execnodes.h    |  6 ++++++
 4 files changed, 58 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bfd3ebc601e..57dcdeda056 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -920,6 +920,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation may be modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
@@ -990,6 +994,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
@@ -3033,6 +3041,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_output_cid = parentestate->es_output_cid;
 	rcestate->es_queryEnv = parentestate->es_queryEnv;
 
+	/*
+	 * Use a deep copy to avoid stale pointers since bms_add_member() may
+	 * reallocate the bitmap.
+	 */
+	rcestate->es_modified_relids = bms_copy(parentestate->es_modified_relids);
+
 	/*
 	 * ResultRelInfos needed by subplans are initialized from scratch when the
 	 * subplans themselves are initialized.
@@ -3165,6 +3179,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 */
 	epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(rcestate);
+#endif
+
 	MemoryContextSwitchTo(oldcontext);
 }
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cd4d5452cfb..0f8364b8720 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -123,6 +123,8 @@ CreateExecutorState(void)
 	estate->es_part_prune_results = NIL;
 	estate->es_unpruned_relids = NULL;
 
+	estate->es_modified_relids = NULL;
+
 	estate->es_junkFilter = NULL;
 
 	estate->es_output_cid = (CommandId) 0;
@@ -871,6 +873,33 @@ ExecGetRangeTableRelation(EState *estate, Index rti, bool isResultRel)
 	return rel;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Assert that es_modified_relids includes all potentially modified RT
+ * indexes.
+ */
+void
+CrossCheckModifiedRelids(EState *estate)
+{
+	Bitmapset  *expected = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = lfirst_node(ResultRelInfo, lc);
+
+		expected = bms_add_member(expected, rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (Index rti = 1; rti <= estate->es_range_table_size; rti++)
+			if (estate->es_rowmarks[rti - 1] != NULL)
+				expected = bms_add_member(expected, rti);
+	}
+	Assert(bms_is_subset(expected, estate->es_modified_relids));
+}
+#endif
+
 /*
  * ExecInitResultRelation
  *		Open relation given by the passed-in RT index and fill its
@@ -896,6 +925,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..05f032baeaa 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -703,6 +703,9 @@ extern Relation ExecGetRangeTableRelation(EState *estate, Index rti,
 										  bool isResultRel);
 extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 								   Index rti);
+#ifdef USE_ASSERT_CHECKING
+extern void CrossCheckModifiedRelids(EState *estate);
+#endif
 
 extern int	executor_errposition(EState *estate, int location);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63c067d5aae..610385df12b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -679,6 +679,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query through a
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v37-0011-Thread-flags-through-begin-scan-APIs.patch (21.5K, 12-v37-0011-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From b1cf736a37b4c5ba7bb390585a7002b969b8abeb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:17 -0500
Subject: [PATCH v37 11/15] Thread flags through begin-scan APIs

Add a flags parameter to the index_fetch_begin() table AM callback and
the begin-scan helpers so the executor can pass context for building
scan descriptors. This introduces an extension point for follow-up work
to mark relations as read-only for the current query, without changing
behavior in this patch.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  6 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 ++++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 ++++----
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 ++++----
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      |  4 ++--
 src/backend/executor/nodeSeqscan.c        |  6 +++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/tableam.h              | 17 +++++++++--------
 22 files changed, 50 insertions(+), 46 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 1909c3254b5..a221e032f5d 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index c7e38dbe193..d48c85e895c 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2061,7 +2061,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index b6ed5938477..f4b169e2c04 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,7 +79,7 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
@@ -761,7 +761,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -770,7 +770,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1fe7ffb2487 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +716,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 43f64a0e721..1827208396c 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 69ef1527e06..bc4eedba4ac 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1927,7 +1927,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..b3aeee36ce6 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d6ef7275a64..900199dbe29 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1159,7 +1159,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 85242dcc245..09796fa4307 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6388,7 +6388,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13765,7 +13765,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22667,7 +22667,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23131,7 +23131,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cb3e4f67ea1 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..5b8ca1abf62 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -383,7 +383,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +602,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +666,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index e0b6df64767..b3b6da3d7e4 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -108,7 +108,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c2d09374517..cf4d9a4f832 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a616abff04c..a7af2f6628a 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -111,7 +111,7 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index af3c788ce8b..d9d7ec0516a 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -374,7 +374,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -407,5 +407,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d4da0e8dea9..5b2165c267d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7161,7 +7161,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 4c0429cc613..3934fa44793 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,7 +156,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..e881e4f82a0 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -420,7 +420,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -894,9 +894,9 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
@@ -939,9 +939,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
 }
@@ -1139,7 +1139,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1175,7 +1176,7 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
@@ -1185,7 +1186,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v37-0012-Pass-down-information-on-table-modification-to-s.patch (8.0K, 13-v37-0012-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 1a3685e4cf28fa0668861e3e5b25cfa7cb216c85 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v37 12/15] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, and bitmap
table scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/access/heap/heapam_handler.c  |  1 +
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  9 +++++++-
 src/backend/executor/nodeIndexscan.c      | 18 ++++++++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              |  3 +++
 7 files changed, 65 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index f4b169e2c04..098ca32fa84 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -86,6 +86,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index b3b6da3d7e4..9bcf9a68183 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -104,11 +104,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index cf4d9a4f832..2fe724a323f 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -84,6 +84,12 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -94,7 +100,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys, 0);
+								   node->ioss_NumOrderByKeys,
+								   flags);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a7af2f6628a..8730dab7469 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -198,6 +205,12 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -207,7 +220,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index d9d7ec0516a..336354922a2 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f77a00291bb..caa5e9b4206 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -130,6 +130,12 @@ typedef struct IndexFetchHeapData
 
 	/* Current heap block's corresponding page in the visibility map */
 	Buffer		xs_vmbuffer;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e881e4f82a0..51dfd122307 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
-- 
2.43.0



  [text/x-patch] v37-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch (9.8K, 14-v37-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 56278ea42704a13fa6af34bb3dbb797170080e8b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v37 13/15] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              |  3 +-
 src/backend/access/heap/heapam_handler.c      |  6 ++-
 src/backend/access/heap/pruneheap.c           | 41 +++++++++++++++----
 src/backend/access/heap/vacuumlazy.c          |  2 +-
 src/include/access/heapam.h                   | 12 ++++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 6 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0d6e3bc7884..abc6fe904fb 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 098ca32fa84..b8a2010c188 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -147,7 +147,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								!hscan->modifies_base_rel);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2542,7 +2543,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 94be0348509..9f545a1eaf2 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -213,7 +215,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -233,7 +236,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -315,6 +319,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = 0;
+			if (rel_read_only)
+				params.options = HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -371,6 +377,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -873,21 +880,37 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * consider setting the VM.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1103,7 +1126,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8bd178ae7e6..d2cae77b52a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2008,7 +2008,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index caa5e9b4206..21b640d459c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -96,7 +97,8 @@ typedef struct HeapScanDescData
 
 	/*
 	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
+	 * corresponding page in the visibility map. If the relation is not
+	 * modified by the query, on-access pruning may set the VM.
 	 */
 	Buffer		rs_vmbuffer;
 
@@ -128,7 +130,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 
 	/*
@@ -439,7 +445,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v37-0014-Avoid-BufferGetPage-calls-in-heap_update.patch (5.6K, 15-v37-0014-Avoid-BufferGetPage-calls-in-heap_update.patch)
  download | inline diff:
From cb6b3b22f2a1b56ee9bdda8fd605ab2c956555b3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 6 Mar 2026 16:46:01 -0500
Subject: [PATCH v37 14/15] Avoid BufferGetPage() calls in heap_update()

BufferGetPage() isn't cheap and heap_update() calls it multiple times
when it could just save the page from a single call. Do that.
While we are at it, make separate variables for old and new page in
heap_xlog_update(). It's confusing to reuse "page" for both pages.
---
 src/backend/access/heap/heapam.c      | 17 ++++++++------
 src/backend/access/heap/heapam_xlog.c | 34 ++++++++++++++-------------
 2 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index abc6fe904fb..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3339,7 +3339,8 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
 	HeapTuple	heaptup;
 	HeapTuple	old_key_tuple = NULL;
 	bool		old_key_copied = false;
-	Page		page;
+	Page		page,
+				newpage;
 	BlockNumber block;
 	MultiXactStatus mxact_status;
 	Buffer		buffer,
@@ -4065,6 +4066,8 @@ l2:
 		heaptup = newtup;
 	}
 
+	newpage = BufferGetPage(newbuf);
+
 	/*
 	 * We're about to do the actual update -- check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -4179,17 +4182,17 @@ l2:
 	oldtup.t_data->t_ctid = heaptup->t_self;
 
 	/* clear PD_ALL_VISIBLE flags, reset all visibilitymap bits */
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
-	if (newbuf != buffer && PageIsAllVisible(BufferGetPage(newbuf)))
+	if (newbuf != buffer && PageIsAllVisible(newpage))
 	{
 		all_visible_cleared_new = true;
-		PageClearAllVisible(BufferGetPage(newbuf));
+		PageClearAllVisible(newpage);
 		visibilitymap_clear(relation, BufferGetBlockNumber(newbuf),
 							vmbuffer_new, VISIBILITYMAP_VALID_BITS);
 	}
@@ -4220,9 +4223,9 @@ l2:
 								 all_visible_cleared_new);
 		if (newbuf != buffer)
 		{
-			PageSetLSN(BufferGetPage(newbuf), recptr);
+			PageSetLSN(newpage, recptr);
 		}
-		PageSetLSN(BufferGetPage(buffer), recptr);
+		PageSetLSN(page, recptr);
 	}
 
 	END_CRIT_SECTION();
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a83f6b03d69..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -685,7 +685,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 	ItemPointerData newtid;
 	Buffer		obuffer,
 				nbuffer;
-	Page		page;
+	Page		opage,
+				npage;
 	OffsetNumber offnum;
 	ItemId		lp;
 	HeapTupleData oldtup;
@@ -749,15 +750,15 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 									  &obuffer);
 	if (oldaction == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(obuffer);
+		opage = BufferGetPage(obuffer);
 		offnum = xlrec->old_offnum;
-		if (offnum < 1 || offnum > PageGetMaxOffsetNumber(page))
+		if (offnum < 1 || offnum > PageGetMaxOffsetNumber(opage))
 			elog(PANIC, "offnum out of range");
-		lp = PageGetItemId(page, offnum);
+		lp = PageGetItemId(opage, offnum);
 		if (!ItemIdIsNormal(lp))
 			elog(PANIC, "invalid lp");
 
-		htup = (HeapTupleHeader) PageGetItem(page, lp);
+		htup = (HeapTupleHeader) PageGetItem(opage, lp);
 
 		oldtup.t_data = htup;
 		oldtup.t_len = ItemIdGetLength(lp);
@@ -776,12 +777,12 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		htup->t_ctid = newtid;
 
 		/* Mark the page as a candidate for pruning */
-		PageSetPrunable(page, XLogRecGetXid(record));
+		PageSetPrunable(opage, XLogRecGetXid(record));
 
 		if (xlrec->flags & XLH_UPDATE_OLD_ALL_VISIBLE_CLEARED)
-			PageClearAllVisible(page);
+			PageClearAllVisible(opage);
 
-		PageSetLSN(page, lsn);
+		PageSetLSN(opage, lsn);
 		MarkBufferDirty(obuffer);
 	}
 
@@ -796,8 +797,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 	else if (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE)
 	{
 		nbuffer = XLogInitBufferForRedo(record, 0);
-		page = BufferGetPage(nbuffer);
-		PageInit(page, BufferGetPageSize(nbuffer), 0);
+		npage = BufferGetPage(nbuffer);
+		PageInit(npage, BufferGetPageSize(nbuffer), 0);
 		newaction = BLK_NEEDS_REDO;
 	}
 	else
@@ -829,10 +830,10 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		recdata = XLogRecGetBlockData(record, 0, &datalen);
 		recdata_end = recdata + datalen;
 
-		page = BufferGetPage(nbuffer);
+		npage = BufferGetPage(nbuffer);
 
 		offnum = xlrec->new_offnum;
-		if (PageGetMaxOffsetNumber(page) + 1 < offnum)
+		if (PageGetMaxOffsetNumber(npage) + 1 < offnum)
 			elog(PANIC, "invalid max offset number");
 
 		if (xlrec->flags & XLH_UPDATE_PREFIX_FROM_OLD)
@@ -909,16 +910,17 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		/* Make sure there is no forward chain link in t_ctid */
 		htup->t_ctid = newtid;
 
-		offnum = PageAddItem(page, htup, newlen, offnum, true, true);
+		offnum = PageAddItem(npage, htup, newlen, offnum, true, true);
 		if (offnum == InvalidOffsetNumber)
 			elog(PANIC, "failed to add tuple");
 
 		if (xlrec->flags & XLH_UPDATE_NEW_ALL_VISIBLE_CLEARED)
-			PageClearAllVisible(page);
+			PageClearAllVisible(npage);
 
-		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+		/* needed to update FSM below */
+		freespace = PageGetHeapFreeSpace(npage);
 
-		PageSetLSN(page, lsn);
+		PageSetLSN(npage, lsn);
 		MarkBufferDirty(nbuffer);
 	}
 
-- 
2.43.0



  [text/x-patch] v37-0015-Set-pd_prune_xid-on-insert.patch (10.4K, 16-v37-0015-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From ad64476c5afb9180ae99bf202681833d9dbbdfbe Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v37 15/15] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to run and set the VM
all-visible after a page is filled with newly inserted tuples the first
time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 40 ++++++++++++-------
 src/backend/access/heap/heapam_xlog.c         | 19 ++++++++-
 src/backend/access/heap/pruneheap.c           | 14 +++----
 .../modules/index/expected/killtuples.out     |  8 ++--
 4 files changed, 54 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..c199646b25d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9f545a1eaf2..9e51c961c3c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1849,16 +1849,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_fix_vm_corruption(prstate, offnum);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..700144d6783 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -329,7 +329,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-15 19:10  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-03-15 19:10 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, Mar 11, 2026 at 1:01 PM Melanie Plageman
<[email protected]> wrote:
>
> On Fri, Mar 6, 2026 at 6:33 PM Melanie Plageman
> <[email protected]> wrote:
> >
> > Thanks for the review! Attached is v36. I've pushed some of the early
> > patches in the set and this is what is left.
>
> I've gone ahead and pushed another of the introductory commits.
> Attached v37 has the remaining patches.

I've pushed a few more of the trivial commits in the set. Attached v38
has the remaining patches.

- Melanie


Attachments:

  [text/x-patch] v38-0001-Fix-visibility-map-corruption-in-more-cases.patch (18.6K, 2-v38-0001-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From 0ca92d2ccee0e589a35a79f9046c3a7900ecacf4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v38 01/12] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 176 ++++++++++++++++++++++++---
 src/backend/access/heap/vacuumlazy.c |  89 +-------------
 src/include/access/heapam.h          |  12 ++
 3 files changed, 175 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8d9f0694206..52cafb23c6b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -114,6 +114,21 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -168,6 +183,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -175,7 +191,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -209,8 +226,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  *
  * This function may pin *vmbuffer. It's passed by reference so the caller can
- * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
- * responsible for unpinning it.
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
+ * VM corruption during pruning, we will fix it. Caller is responsible for
+ * unpinning *vmbuffer.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -277,6 +295,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -284,14 +312,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -354,6 +375,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->vmbits = visibilitymap_get_status(prstate->relation,
+											   prstate->block,
+											   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -770,6 +797,90 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Helper to fix visibility-related corruption on a heap page and its
+ * corresponding VM page. An all-visible page cannot have dead items nor can
+ * it have tuples that are not visible to all running transactions. It clears
+ * the VM corruption as well as resetting the vmbits used during pruning.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and any dead items must have been discovered under that same lock.
+ * Although we do not hold a lock on the VM buffer, it is pinned, and the heap
+ * buffer is exclusively locked, ensuring that no other backend can update the
+ * VM bits corresponding to this heap page.
+ *
+ * This function makes changes to the VM and, potentially, the heap page, but
+ * it does not need to be done in a critical section.
+ */
+static void
+heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
+{
+	const char *relname = RelationGetRelationName(prstate->relation);
+
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	if (PageIsAllVisible(prstate->page))
+	{
+		/*
+		 * It's possible for the value returned by
+		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+		 * wrong for us to see tuples that appear to not be visible to
+		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
+		 * xmin value never moves backwards, but
+		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
+		 * returns a value that's unnecessarily small, so if we see that
+		 * contradiction it just means that the tuples that we think are not
+		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
+		 * is correct.
+		 *
+		 * However, there should never be LP_DEAD items, dead tuple versions,
+		 * or tuples inserted by an in-progress transaction on a page with
+		 * PD_ALL_VISIBLE set.
+		 */
+		if (prstate->lpdead_items > 0)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("dead line pointer found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+		else
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+
+		/*
+		 * Mark the buffer dirty now in case we make no further changes and
+		 * therefore would not mark it dirty later.
+		 */
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	{
+		/*
+		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
+		 * the page-level bit is clear. However, for vacuum, it's possible
+		 * that the bit got cleared after heap_vac_scan_next_block() was
+		 * called, so we must recheck now that we have the buffer lock before
+		 * concluding that the VM is corrupt.
+		 */
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set"),
+				 errcontext("relation \"%s\", page %u",
+							relname, prstate->block)));
+	}
+
+	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+						VISIBILITYMAP_VALID_BITS);
+	prstate->vmbits = 0;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -830,6 +941,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -973,6 +1088,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->set_all_visible = prstate.set_all_visible;
 	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->vmbits = prstate.vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1295,7 +1411,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1305,6 +1422,13 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be redirected */
@@ -1388,6 +1512,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be marked unused */
@@ -1527,7 +1660,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1542,6 +1676,10 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_fix_vm_corruption(prstate, offnum);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1566,7 +1704,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1632,6 +1771,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 82c5b28e0ad..957322648ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -425,11 +425,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1964,81 +1959,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2070,6 +1990,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2179,18 +2100,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.set_all_visible || !(*has_lpdead_items));
 	Assert(!presult.set_all_frozen || presult.set_all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.set_all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.set_all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2fdc50b865b..c649e5f1980 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -262,6 +262,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -324,6 +330,12 @@ typedef struct PruneFreezeResult
 	bool		set_all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
+	 * pruning. It is cleared if VM corruption is found and corrected.
+	 */
+	uint8		vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v38-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (4.5K, 3-v38-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From 4ebe52f1b060db395d8abe5255ea1a86ed4fdc4a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v38 02/12] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, we exit early.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 75 +++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 52cafb23c6b..a4a0a916f61 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -184,6 +184,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
 static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
+static void heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -882,6 +883,68 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 	prstate->vmbits = 0;
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling heap_page_bypass_prune_freeze(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	presult->vmbits = prstate->vmbits;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (PageIsEmpty(page))
+		return;
+
+	presult->hastup = true;
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -945,6 +1008,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can exit early. Do this after fixing any
+	 * discrepancy between the page-level visibility hint and the VM.
+	 */
+	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	{
+		heap_page_bypass_prune_freeze(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
-- 
2.43.0



  [text/x-patch] v38-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.4K, 4-v38-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 07396958d6588cd82ac420555b4d4b25194ced2d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v38 03/12] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page;
if it is globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid. This approach will result in examining more tuple
xmins than before; however, the additional cost should not be
significant. And doing so will enable us to set the visibility map on
access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 37 +++++++--------
 src/backend/access/heap/vacuumlazy.c        | 51 +++++++++++++--------
 src/include/access/heapam.h                 |  2 +
 4 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..d70fab3a763 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a4a0a916f61..05fe3deeb95 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1028,6 +1028,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest,
+									 prstate.visibility_cutoff_xid))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1695,29 +1706,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 957322648ca..ad85e1e1738 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -461,13 +461,13 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2054,13 +2054,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
 	}
 #endif
 
@@ -2816,7 +2813,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3577,14 +3574,14 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3605,7 +3602,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3624,7 +3621,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3705,7 +3702,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3714,16 +3711,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3752,6 +3750,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c649e5f1980..bbb223dd0d2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -479,6 +479,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v38-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (14.8K, 5-v38-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From a3211750778f6a8bec42edd25f5763e2ae31d21c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v38 04/12] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 127 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++----
 2 files changed, 65 insertions(+), 92 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 05fe3deeb95..01c19ca8796 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -160,11 +163,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -174,7 +172,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /* Local functions */
@@ -435,53 +432,35 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Other callers must initialize prstate.set_all_frozen to false,
+	 * since we will not call heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * We only consider opportunistic freezing if the page would become
+	 * all-frozen, or if it would be all-frozen except for dead tuples that
+	 * VACUUM will remove.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear
+	 * set_all_visible and set_all_frozen when we encounter LP_DEAD items.
+	 * Instead, we correct them after deciding whether to freeze, but before
+	 * updating the VM, to avoid setting the VM bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze;
 }
 
 /*
@@ -711,7 +690,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -966,9 +944,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1034,9 +1011,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidMaybeRunning(prstate.vistest,
-									 prstate.visibility_cutoff_xid))
+									 prstate.newest_live_xid))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	/*
@@ -1187,7 +1164,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1647,6 +1624,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1694,32 +1672,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ad85e1e1738..23402e7e26c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,7 +463,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -471,7 +471,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2789,7 +2789,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2815,14 +2815,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2863,7 +2863,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3576,7 +3576,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 
@@ -3584,7 +3584,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3607,7 +3607,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3625,7 +3625,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3635,7 +3635,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3724,9 +3724,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3756,8 +3756,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *newest_live_xid))
 	{
 		all_visible = false;
 		*all_frozen = false;
-- 
2.43.0



  [text/x-patch] v38-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (24.7K, 6-v38-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 532fa6da3a3b691f0cafcc18a57ae2251a8a7725 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v38 05/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 263 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 107 +----------
 src/include/access/heapam.h          |  38 ++--
 3 files changed, 208 insertions(+), 200 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 01c19ca8796..a127e29144e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
-	uint8		vmbits;
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -157,21 +176,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /* Local functions */
@@ -209,7 +213,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -375,9 +379,10 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
-	prstate->vmbits = visibilitymap_get_status(prstate->relation,
-											   prstate->block,
-											   &prstate->vmbuffer);
+	prstate->new_vmbits = 0;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -840,7 +845,7 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 		PageClearAllVisible(prstate->page);
 		MarkBufferDirtyHint(prstate->buffer, true);
 	}
-	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	else if (prstate->old_vmbits & VISIBILITYMAP_VALID_BITS)
 	{
 		/*
 		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
@@ -858,7 +863,43 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 
 	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
 						VISIBILITYMAP_VALID_BITS);
-	prstate->vmbits = 0;
+	prstate->old_vmbits = 0;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * consider setting the VM.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
 }
 
 /*
@@ -887,15 +928,13 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
 	Page		page = prstate->page;
 
-	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
 			!prstate->attempt_freeze));
 
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->vmbits = prstate->vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -925,7 +964,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -940,12 +980,10 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also passed and if the page is
+ * found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -973,15 +1011,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
-	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
@@ -990,8 +1030,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * is not being attempted, we can exit early. Do this after fixing any
 	 * discrepancy between the page-level visibility hint and the VM.
 	 */
-	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	if (prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
 	{
 		heap_page_bypass_prune_freeze(&prstate, presult);
 		return;
@@ -1061,6 +1101,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1082,14 +1146,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1103,6 +1170,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1110,29 +1198,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1142,33 +1213,67 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.set_all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->vmbits = prstate.vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->new_all_visible_pages = 0;
+	presult->new_all_frozen_pages = 0;
+	presult->new_all_visible_frozen_pages = 0;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->new_all_visible_pages = 1;
+			if (prstate.set_all_frozen)
+				presult->new_all_visible_frozen_pages = 1;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->new_all_frozen_pages = 1;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 23402e7e26c..6b5210d6393 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -459,13 +459,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
@@ -1996,8 +1989,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2038,29 +2029,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2081,6 +2049,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	vacrel->new_all_visible_pages += presult.new_all_visible_pages;
+	vacrel->new_all_visible_all_frozen_pages += presult.new_all_visible_frozen_pages;
+	vacrel->new_all_frozen_pages += presult.new_all_frozen_pages;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.new_all_visible_frozen_pages > 0 ||
+		presult.new_all_frozen_pages > 0;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2094,71 +2070,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3572,7 +3483,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bbb223dd0d2..f77a00291bb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -264,7 +264,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -280,8 +281,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -315,26 +315,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
-	 * pruning. It is cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		vmbits;
+	BlockNumber new_all_visible_pages;
+	BlockNumber new_all_visible_frozen_pages;
+	BlockNumber new_all_frozen_pages;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -471,7 +457,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
-
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v38-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 7-v38-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From db4cc2361ccc446b54df3d2d5afde70f6869dde1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v38 06/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6b5210d6393..1451c943644 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1903,9 +1903,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1923,13 +1926,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v38-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.0K, 8-v38-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 97a248e7711eaed31954dd7089790e3369b0c58a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v38 07/12] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 110 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 372 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 8a67bfa1aff..d9042e1f91d 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_freeze() for
+	 * more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..044f385e477 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8886,50 +8886,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1da774c1536..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1369,9 +1232,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a127e29144e..9b5a0726f2b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1187,8 +1187,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1451c943644..8bd178ae7e6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1926,11 +1926,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2794,9 +2794,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..f1da52b2069 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -222,112 +221,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..66ed51a8aa1 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -477,12 +477,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ec8513d90b5..4c7ce9bd4b5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4389,7 +4389,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v38-0008-Track-which-relations-are-modified-by-a-query.patch (5.8K, 9-v38-0008-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From f707d345f3ee43a9b5e914e4d496c83485ea380b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v38 08/12] Track which relations are modified by a query

Save the relids of modified relations in a bitmap in the executor state.
A later commit will pass this information down to scan nodes to control
whether or not on-access pruning is allowed to set the visibility map.
Setting the visibility map during a scan is counterproductive if the
query is going to modify the page immediately after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this bitmap is only used as a hint
to avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c  | 18 ++++++++++++++++++
 src/backend/executor/execUtils.c | 31 +++++++++++++++++++++++++++++++
 src/include/executor/executor.h  |  3 +++
 src/include/nodes/execnodes.h    |  6 ++++++
 4 files changed, 58 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bfd3ebc601e..57dcdeda056 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -920,6 +920,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation may be modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
@@ -990,6 +994,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
@@ -3033,6 +3041,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_output_cid = parentestate->es_output_cid;
 	rcestate->es_queryEnv = parentestate->es_queryEnv;
 
+	/*
+	 * Use a deep copy to avoid stale pointers since bms_add_member() may
+	 * reallocate the bitmap.
+	 */
+	rcestate->es_modified_relids = bms_copy(parentestate->es_modified_relids);
+
 	/*
 	 * ResultRelInfos needed by subplans are initialized from scratch when the
 	 * subplans themselves are initialized.
@@ -3165,6 +3179,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 */
 	epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(rcestate);
+#endif
+
 	MemoryContextSwitchTo(oldcontext);
 }
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cd4d5452cfb..0f8364b8720 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -123,6 +123,8 @@ CreateExecutorState(void)
 	estate->es_part_prune_results = NIL;
 	estate->es_unpruned_relids = NULL;
 
+	estate->es_modified_relids = NULL;
+
 	estate->es_junkFilter = NULL;
 
 	estate->es_output_cid = (CommandId) 0;
@@ -871,6 +873,33 @@ ExecGetRangeTableRelation(EState *estate, Index rti, bool isResultRel)
 	return rel;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Assert that es_modified_relids includes all potentially modified RT
+ * indexes.
+ */
+void
+CrossCheckModifiedRelids(EState *estate)
+{
+	Bitmapset  *expected = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = lfirst_node(ResultRelInfo, lc);
+
+		expected = bms_add_member(expected, rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (Index rti = 1; rti <= estate->es_range_table_size; rti++)
+			if (estate->es_rowmarks[rti - 1] != NULL)
+				expected = bms_add_member(expected, rti);
+	}
+	Assert(bms_is_subset(expected, estate->es_modified_relids));
+}
+#endif
+
 /*
  * ExecInitResultRelation
  *		Open relation given by the passed-in RT index and fill its
@@ -896,6 +925,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 82c442d23f8..1411d5276ca 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -705,6 +705,9 @@ extern Relation ExecGetRangeTableRelation(EState *estate, Index rti,
 										  bool isResultRel);
 extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 								   Index rti);
+#ifdef USE_ASSERT_CHECKING
+extern void CrossCheckModifiedRelids(EState *estate);
+#endif
 
 extern int	executor_errposition(EState *estate, int location);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63c067d5aae..610385df12b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -679,6 +679,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query through a
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v38-0009-Thread-flags-through-begin-scan-APIs.patch (28.1K, 10-v38-0009-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From e501ec27844ae056c9d5b0439e327ded450c9ce2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:17 -0500
Subject: [PATCH v38 09/12] Thread flags through begin-scan APIs

Add a flags parameter to the index_fetch_begin() table AM callback and
the begin-scan helpers so the executor can pass context for building
scan descriptors. This introduces an extension point for follow-up work
to mark relations as read-only for the current query, without changing
behavior in this patch.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  6 ++--
 src/backend/access/index/genam.c          |  4 +--
 src/backend/access/index/indexam.c        |  8 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 13 +++++----
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +++---
 src/backend/commands/typecmds.c           |  4 +--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++---
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  6 ++--
 src/backend/executor/nodeIndexscan.c      |  8 +++---
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  6 ++--
 src/backend/executor/nodeTidrangescan.c   |  6 ++--
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  5 ++--
 src/include/access/heapam.h               |  5 ++--
 src/include/access/tableam.h              | 35 ++++++++++++++---------
 25 files changed, 81 insertions(+), 65 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 1909c3254b5..a221e032f5d 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 97cea5f7d4e..74243efa74f 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2065,7 +2065,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 42bf73d3138..6122603d11e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,7 +79,7 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
@@ -761,7 +761,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -770,7 +770,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1fe7ffb2487 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +716,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 5eb7e99ad3e..87219613f0b 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -593,7 +593,7 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 69ef1527e06..bc4eedba4ac 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1927,7 +1927,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..e946cfb393a 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -190,12 +191,14 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
 
+	flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
 	/* disable syncscan in parallel tid range scan. */
@@ -248,7 +251,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d6ef7275a64..900199dbe29 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1159,7 +1159,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index cd6d720386f..0455b36c41e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6396,7 +6396,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13965,7 +13965,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22867,7 +22867,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23331,7 +23331,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cb3e4f67ea1 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..5b8ca1abf62 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -383,7 +383,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +602,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +666,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 74eac93284e..620fc7e259a 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -108,7 +108,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 9e8ea8ddf22..aefb792ee6e 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -788,7 +788,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -854,7 +854,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4513b1f7a90..477cd4fcf99 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -111,7 +111,7 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1723,7 +1723,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1787,7 +1787,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 1b0af70fd7a..47660baf2fa 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -297,7 +297,7 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode, 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index af3c788ce8b..d9d7ec0516a 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -374,7 +374,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -407,5 +407,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 503817da65b..461edb8893b 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,7 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid, 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -459,7 +459,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -493,5 +493,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d4da0e8dea9..5b2165c267d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7161,7 +7161,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 4c0429cc613..9abcc99d6c8 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,7 +156,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -182,7 +182,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f77a00291bb..c2621dc2fac 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,8 +95,9 @@ typedef struct HeapScanDescData
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
 	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map.
 	 */
 	Buffer		rs_vmbuffer;
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..f1065e30638 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -418,9 +418,12 @@ typedef struct TableAmRoutine
 	 * IndexFetchTableData, which the AM will typically embed in a larger
 	 * structure with additional information.
 	 *
+	 * 'flags' is a bitmask of SO_* flags providing hints from the executor
+	 * about the scan context.
+	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -894,9 +897,9 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
@@ -939,9 +942,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
 }
@@ -957,9 +960,9 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	flags |= SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
 		flags |= SO_ALLOW_STRAT;
@@ -1059,10 +1062,11 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+
+	flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
 
@@ -1139,7 +1143,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1154,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1172,10 +1178,13 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Prepare to fetch tuples from the relation, as needed when fetching tuples
  * for an index scan.
  *
+ * 'flags' is a bitmask of SO_* flags providing hints from the executor about
+ * the scan context.
+ *
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
@@ -1185,7 +1194,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v38-0010-Pass-down-information-on-table-modification-to-s.patch (14.5K, 11-v38-0010-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 3a6b08fc3219afd79dc81a5219e6a543d67036f6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v38 10/12] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/access/heap/heapam_handler.c  |  1 +
 src/backend/executor/nodeBitmapHeapscan.c |  9 ++++++-
 src/backend/executor/nodeIndexonlyscan.c  | 25 +++++++++++++++---
 src/backend/executor/nodeIndexscan.c      | 32 ++++++++++++++++++++---
 src/backend/executor/nodeSamplescan.c     |  8 +++++-
 src/backend/executor/nodeSeqscan.c        | 26 +++++++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 24 ++++++++++++++---
 src/include/access/heapam.h               |  6 +++++
 src/include/access/tableam.h              |  3 +++
 9 files changed, 119 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6122603d11e..d35b688d751 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -86,6 +86,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 620fc7e259a..a5ab5e2b37f 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -104,11 +104,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index aefb792ee6e..6d7a32c1cb8 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -84,6 +84,12 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -94,7 +100,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys, 0);
+								   node->ioss_NumOrderByKeys,
+								   flags);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -761,6 +768,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -782,13 +790,18 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -829,6 +842,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -848,13 +862,18 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 477cd4fcf99..52b7fc46593 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -198,6 +205,12 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -207,7 +220,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1696,6 +1710,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -1717,13 +1732,17 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1762,6 +1781,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -1781,13 +1801,17 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 47660baf2fa..62eff19bc4f 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -291,13 +291,19 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) scanstate->ss.ps.plan)->scanrelid,
+						   scanstate->ss.ps.state->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode, 0);
+									 scanstate->use_pagemode, flags);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index d9d7ec0516a..65349ea9c54 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 461edb8893b..7fbdf401734 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,10 +242,16 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = 0;
+
+			if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+							   estate->es_modified_relids))
+				flags |= SO_HINT_REL_READ_ONLY;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid, 0);
+												&node->trss_maxtid, flags);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -451,15 +457,21 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
 
 /* ----------------------------------------------------------------
@@ -489,9 +501,15 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c2621dc2fac..978ea90ffa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -131,6 +131,12 @@ typedef struct IndexFetchHeapData
 
 	/* Current heap block's corresponding page in the visibility map */
 	Buffer		xs_vmbuffer;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index f1065e30638..57ce94a386f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
-- 
2.43.0



  [text/x-patch] v38-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.5K, 12-v38-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From aa05e68336207dbb64c0468ab3f017f8f66f9e05 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v38 11/12] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              |  3 +-
 src/backend/access/heap/heapam_handler.c      |  6 ++-
 src/backend/access/heap/pruneheap.c           | 46 ++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c          |  2 +-
 src/include/access/heapam.h                   | 12 +++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 6 files changed, 52 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d35b688d751..a083b69ffcd 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -147,7 +147,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								!hscan->modifies_base_rel);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2542,7 +2543,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9b5a0726f2b..3cdc1a36441 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -213,7 +215,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -233,7 +236,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -315,6 +319,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = 0;
+			if (rel_read_only)
+				params.options = HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -371,6 +377,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -440,9 +447,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -873,21 +879,37 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * consider setting the VM.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1103,7 +1125,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8bd178ae7e6..d2cae77b52a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2008,7 +2008,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 978ea90ffa2..768d442c39c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -97,7 +98,8 @@ typedef struct HeapScanDescData
 	/*
 	 * For sequential scans, bitmap heap scans, TID range scans, and sample
 	 * scans. The current heap block's corresponding page in the visibility
-	 * map.
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
 	 */
 	Buffer		rs_vmbuffer;
 
@@ -129,7 +131,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 
 	/*
@@ -440,7 +446,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v38-0012-Set-pd_prune_xid-on-insert.patch (10.9K, 13-v38-0012-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From fc597950684dad6328114ac0d10f791bc52b53c4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v38 12/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to run and set the VM
all-visible after a page is filled with newly inserted tuples the first
time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 40 ++++++++++++-------
 src/backend/access/heap/heapam_xlog.c         | 19 ++++++++-
 src/backend/access/heap/pruneheap.c           | 17 ++++----
 .../modules/index/expected/killtuples.out     |  8 ++--
 4 files changed, 56 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..c199646b25d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3cdc1a36441..7cb9e1e2aac 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -255,7 +255,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1848,16 +1849,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_fix_vm_corruption(prstate, offnum);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..700144d6783 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -329,7 +329,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-16 14:53  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-03-16 14:53 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Sun, Mar 15, 2026 at 3:10 PM Melanie Plageman
<[email protected]> wrote:
>
> I've pushed a few more of the trivial commits in the set. Attached v38
> has the remaining patches.

Looks like cfbot wasn't able to rebase v38 on its own for some reason.
v39 attached.

- Melanie


Attachments:

  [text/x-patch] v39-0001-Fix-visibility-map-corruption-in-more-cases.patch (18.6K, 2-v39-0001-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From c49f30de550eb8f7c87a7ae80435abda3021fa3a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v39 01/12] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 176 ++++++++++++++++++++++++---
 src/backend/access/heap/vacuumlazy.c |  89 +-------------
 src/include/access/heapam.h          |  12 ++
 3 files changed, 175 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8d9f0694206..52cafb23c6b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -114,6 +114,21 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -168,6 +183,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -175,7 +191,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -209,8 +226,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  *
  * This function may pin *vmbuffer. It's passed by reference so the caller can
- * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
- * responsible for unpinning it.
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
+ * VM corruption during pruning, we will fix it. Caller is responsible for
+ * unpinning *vmbuffer.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -277,6 +295,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -284,14 +312,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -354,6 +375,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->vmbits = visibilitymap_get_status(prstate->relation,
+											   prstate->block,
+											   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -770,6 +797,90 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Helper to fix visibility-related corruption on a heap page and its
+ * corresponding VM page. An all-visible page cannot have dead items nor can
+ * it have tuples that are not visible to all running transactions. It clears
+ * the VM corruption as well as resetting the vmbits used during pruning.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and any dead items must have been discovered under that same lock.
+ * Although we do not hold a lock on the VM buffer, it is pinned, and the heap
+ * buffer is exclusively locked, ensuring that no other backend can update the
+ * VM bits corresponding to this heap page.
+ *
+ * This function makes changes to the VM and, potentially, the heap page, but
+ * it does not need to be done in a critical section.
+ */
+static void
+heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
+{
+	const char *relname = RelationGetRelationName(prstate->relation);
+
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	if (PageIsAllVisible(prstate->page))
+	{
+		/*
+		 * It's possible for the value returned by
+		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+		 * wrong for us to see tuples that appear to not be visible to
+		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
+		 * xmin value never moves backwards, but
+		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
+		 * returns a value that's unnecessarily small, so if we see that
+		 * contradiction it just means that the tuples that we think are not
+		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
+		 * is correct.
+		 *
+		 * However, there should never be LP_DEAD items, dead tuple versions,
+		 * or tuples inserted by an in-progress transaction on a page with
+		 * PD_ALL_VISIBLE set.
+		 */
+		if (prstate->lpdead_items > 0)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("dead line pointer found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+		else
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+
+		/*
+		 * Mark the buffer dirty now in case we make no further changes and
+		 * therefore would not mark it dirty later.
+		 */
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	{
+		/*
+		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
+		 * the page-level bit is clear. However, for vacuum, it's possible
+		 * that the bit got cleared after heap_vac_scan_next_block() was
+		 * called, so we must recheck now that we have the buffer lock before
+		 * concluding that the VM is corrupt.
+		 */
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set"),
+				 errcontext("relation \"%s\", page %u",
+							relname, prstate->block)));
+	}
+
+	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+						VISIBILITYMAP_VALID_BITS);
+	prstate->vmbits = 0;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -830,6 +941,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -973,6 +1088,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->set_all_visible = prstate.set_all_visible;
 	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->vmbits = prstate.vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1295,7 +1411,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1305,6 +1422,13 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be redirected */
@@ -1388,6 +1512,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be marked unused */
@@ -1527,7 +1660,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1542,6 +1676,10 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_fix_vm_corruption(prstate, offnum);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1566,7 +1704,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1632,6 +1771,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 82c5b28e0ad..957322648ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -425,11 +425,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1964,81 +1959,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2070,6 +1990,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2179,18 +2100,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.set_all_visible || !(*has_lpdead_items));
 	Assert(!presult.set_all_frozen || presult.set_all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.set_all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.set_all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2fdc50b865b..c649e5f1980 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -262,6 +262,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -324,6 +330,12 @@ typedef struct PruneFreezeResult
 	bool		set_all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
+	 * pruning. It is cleared if VM corruption is found and corrected.
+	 */
+	uint8		vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v39-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (4.5K, 3-v39-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From 788860ded375fcf744201347b9dcbf496070bfb5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v39 02/12] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, we exit early.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 75 +++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 52cafb23c6b..a4a0a916f61 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -184,6 +184,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
 static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
+static void heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -882,6 +883,68 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 	prstate->vmbits = 0;
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling heap_page_bypass_prune_freeze(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	presult->vmbits = prstate->vmbits;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (PageIsEmpty(page))
+		return;
+
+	presult->hastup = true;
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -945,6 +1008,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can exit early. Do this after fixing any
+	 * discrepancy between the page-level visibility hint and the VM.
+	 */
+	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	{
+		heap_page_bypass_prune_freeze(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
-- 
2.43.0



  [text/x-patch] v39-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.4K, 4-v39-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 5eac34a809eac866d0cd6bf58e305464d3f2e094 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v39 03/12] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page;
if it is globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid. This approach will result in examining more tuple
xmins than before; however, the additional cost should not be
significant. And doing so will enable us to set the visibility map on
access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 37 +++++++--------
 src/backend/access/heap/vacuumlazy.c        | 51 +++++++++++++--------
 src/include/access/heapam.h                 |  2 +
 4 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..d70fab3a763 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a4a0a916f61..05fe3deeb95 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1028,6 +1028,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest,
+									 prstate.visibility_cutoff_xid))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1695,29 +1706,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 957322648ca..ad85e1e1738 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -461,13 +461,13 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2054,13 +2054,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
 	}
 #endif
 
@@ -2816,7 +2813,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3577,14 +3574,14 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3605,7 +3602,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3624,7 +3621,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3705,7 +3702,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3714,16 +3711,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3752,6 +3750,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c649e5f1980..bbb223dd0d2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -479,6 +479,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v39-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (14.8K, 5-v39-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From 9d36149f134e4935eda6e37f111faf164a9bd063 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v39 04/12] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 127 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++----
 2 files changed, 65 insertions(+), 92 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 05fe3deeb95..01c19ca8796 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -160,11 +163,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -174,7 +172,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /* Local functions */
@@ -435,53 +432,35 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Other callers must initialize prstate.set_all_frozen to false,
+	 * since we will not call heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * We only consider opportunistic freezing if the page would become
+	 * all-frozen, or if it would be all-frozen except for dead tuples that
+	 * VACUUM will remove.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear
+	 * set_all_visible and set_all_frozen when we encounter LP_DEAD items.
+	 * Instead, we correct them after deciding whether to freeze, but before
+	 * updating the VM, to avoid setting the VM bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze;
 }
 
 /*
@@ -711,7 +690,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -966,9 +944,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1034,9 +1011,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidMaybeRunning(prstate.vistest,
-									 prstate.visibility_cutoff_xid))
+									 prstate.newest_live_xid))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	/*
@@ -1187,7 +1164,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1647,6 +1624,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1694,32 +1672,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ad85e1e1738..23402e7e26c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,7 +463,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -471,7 +471,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2789,7 +2789,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2815,14 +2815,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2863,7 +2863,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3576,7 +3576,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 
@@ -3584,7 +3584,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3607,7 +3607,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3625,7 +3625,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3635,7 +3635,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3724,9 +3724,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3756,8 +3756,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *newest_live_xid))
 	{
 		all_visible = false;
 		*all_frozen = false;
-- 
2.43.0



  [text/x-patch] v39-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (24.7K, 6-v39-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From bc988115bb293945e0d09028bf235976ef90c8c2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v39 05/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 263 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 107 +----------
 src/include/access/heapam.h          |  38 ++--
 3 files changed, 208 insertions(+), 200 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 01c19ca8796..a127e29144e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
-	uint8		vmbits;
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -157,21 +176,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /* Local functions */
@@ -209,7 +213,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -375,9 +379,10 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
-	prstate->vmbits = visibilitymap_get_status(prstate->relation,
-											   prstate->block,
-											   &prstate->vmbuffer);
+	prstate->new_vmbits = 0;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -840,7 +845,7 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 		PageClearAllVisible(prstate->page);
 		MarkBufferDirtyHint(prstate->buffer, true);
 	}
-	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	else if (prstate->old_vmbits & VISIBILITYMAP_VALID_BITS)
 	{
 		/*
 		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
@@ -858,7 +863,43 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 
 	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
 						VISIBILITYMAP_VALID_BITS);
-	prstate->vmbits = 0;
+	prstate->old_vmbits = 0;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * consider setting the VM.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
 }
 
 /*
@@ -887,15 +928,13 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
 	Page		page = prstate->page;
 
-	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
 			!prstate->attempt_freeze));
 
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->vmbits = prstate->vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -925,7 +964,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -940,12 +980,10 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also passed and if the page is
+ * found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -973,15 +1011,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
-	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
@@ -990,8 +1030,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * is not being attempted, we can exit early. Do this after fixing any
 	 * discrepancy between the page-level visibility hint and the VM.
 	 */
-	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	if (prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
 	{
 		heap_page_bypass_prune_freeze(&prstate, presult);
 		return;
@@ -1061,6 +1101,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1082,14 +1146,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1103,6 +1170,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1110,29 +1198,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1142,33 +1213,67 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.set_all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->vmbits = prstate.vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->new_all_visible_pages = 0;
+	presult->new_all_frozen_pages = 0;
+	presult->new_all_visible_frozen_pages = 0;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->new_all_visible_pages = 1;
+			if (prstate.set_all_frozen)
+				presult->new_all_visible_frozen_pages = 1;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->new_all_frozen_pages = 1;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 23402e7e26c..6b5210d6393 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -459,13 +459,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
@@ -1996,8 +1989,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2038,29 +2029,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2081,6 +2049,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	vacrel->new_all_visible_pages += presult.new_all_visible_pages;
+	vacrel->new_all_visible_all_frozen_pages += presult.new_all_visible_frozen_pages;
+	vacrel->new_all_frozen_pages += presult.new_all_frozen_pages;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.new_all_visible_frozen_pages > 0 ||
+		presult.new_all_frozen_pages > 0;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2094,71 +2070,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3572,7 +3483,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bbb223dd0d2..f77a00291bb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -264,7 +264,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -280,8 +281,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -315,26 +315,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
-	 * pruning. It is cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		vmbits;
+	BlockNumber new_all_visible_pages;
+	BlockNumber new_all_visible_frozen_pages;
+	BlockNumber new_all_frozen_pages;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -471,7 +457,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
-
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v39-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 7-v39-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 5c17a542a95c880f6a8ffaa1dd92baf12b96a1ff Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v39 06/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6b5210d6393..1451c943644 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1903,9 +1903,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1923,13 +1926,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v39-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.0K, 8-v39-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 2e58fcd19b1bf57b0796f2ddcd74a6f2ee760ead Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v39 07/12] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 110 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 372 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 8a67bfa1aff..d9042e1f91d 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_freeze() for
+	 * more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..044f385e477 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8886,50 +8886,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1da774c1536..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1369,9 +1232,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a127e29144e..9b5a0726f2b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1187,8 +1187,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1451c943644..8bd178ae7e6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1926,11 +1926,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2794,9 +2794,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..f1da52b2069 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -222,112 +221,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..66ed51a8aa1 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -477,12 +477,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 52f8603a7be..3102c61125e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4409,7 +4409,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v39-0008-Track-which-relations-are-modified-by-a-query.patch (5.8K, 9-v39-0008-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From cb87aa75f03e0c211cfab4f582d10eec7e0a50aa Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v39 08/12] Track which relations are modified by a query

Save the relids of modified relations in a bitmap in the executor state.
A later commit will pass this information down to scan nodes to control
whether or not on-access pruning is allowed to set the visibility map.
Setting the visibility map during a scan is counterproductive if the
query is going to modify the page immediately after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this bitmap is only used as a hint
to avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c  | 18 ++++++++++++++++++
 src/backend/executor/execUtils.c | 31 +++++++++++++++++++++++++++++++
 src/include/executor/executor.h  |  3 +++
 src/include/nodes/execnodes.h    |  6 ++++++
 4 files changed, 58 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..8d22b6db867 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -922,6 +922,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation may be modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
@@ -992,6 +996,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
@@ -3048,6 +3056,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_output_cid = parentestate->es_output_cid;
 	rcestate->es_queryEnv = parentestate->es_queryEnv;
 
+	/*
+	 * Use a deep copy to avoid stale pointers since bms_add_member() may
+	 * reallocate the bitmap.
+	 */
+	rcestate->es_modified_relids = bms_copy(parentestate->es_modified_relids);
+
 	/*
 	 * ResultRelInfos needed by subplans are initialized from scratch when the
 	 * subplans themselves are initialized.
@@ -3180,6 +3194,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 */
 	epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(rcestate);
+#endif
+
 	MemoryContextSwitchTo(oldcontext);
 }
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9886ab06b69..7dfa95c2cbe 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -125,6 +125,8 @@ CreateExecutorState(void)
 	estate->es_part_prune_results = NIL;
 	estate->es_unpruned_relids = NULL;
 
+	estate->es_modified_relids = NULL;
+
 	estate->es_junkFilter = NULL;
 
 	estate->es_output_cid = (CommandId) 0;
@@ -873,6 +875,33 @@ ExecGetRangeTableRelation(EState *estate, Index rti, bool isResultRel)
 	return rel;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Assert that es_modified_relids includes all potentially modified RT
+ * indexes.
+ */
+void
+CrossCheckModifiedRelids(EState *estate)
+{
+	Bitmapset  *expected = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = lfirst_node(ResultRelInfo, lc);
+
+		expected = bms_add_member(expected, rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (Index rti = 1; rti <= estate->es_range_table_size; rti++)
+			if (estate->es_rowmarks[rti - 1] != NULL)
+				expected = bms_add_member(expected, rti);
+	}
+	Assert(bms_is_subset(expected, estate->es_modified_relids));
+}
+#endif
+
 /*
  * ExecInitResultRelation
  *		Open relation given by the passed-in RT index and fill its
@@ -898,6 +927,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 064df01811e..080cfdac48e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -707,6 +707,9 @@ extern Relation ExecGetRangeTableRelation(EState *estate, Index rti,
 										  bool isResultRel);
 extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 								   Index rti);
+#ifdef USE_ASSERT_CHECKING
+extern void CrossCheckModifiedRelids(EState *estate);
+#endif
 
 extern int	executor_errposition(EState *estate, int location);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0716c5a9aed..d2f4f8ea748 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -688,6 +688,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query through a
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v39-0009-Thread-flags-through-begin-scan-APIs.patch (28.1K, 10-v39-0009-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 335d6419b443b0c574a4458212bde607ad70a89d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:17 -0500
Subject: [PATCH v39 09/12] Thread flags through begin-scan APIs

Add a flags parameter to the index_fetch_begin() table AM callback and
the begin-scan helpers so the executor can pass context for building
scan descriptors. This introduces an extension point for follow-up work
to mark relations as read-only for the current query, without changing
behavior in this patch.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  6 ++--
 src/backend/access/index/genam.c          |  4 +--
 src/backend/access/index/indexam.c        |  8 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 13 +++++----
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +++---
 src/backend/commands/typecmds.c           |  4 +--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++---
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  6 ++--
 src/backend/executor/nodeIndexscan.c      |  8 +++---
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  6 ++--
 src/backend/executor/nodeTidrangescan.c   |  6 ++--
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  5 ++--
 src/include/access/heapam.h               |  5 ++--
 src/include/access/tableam.h              | 35 ++++++++++++++---------
 25 files changed, 81 insertions(+), 65 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..0556e9f7b88 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..b25e814a996 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..555b16771e9 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 253a735b6c1..1e950d8e6e5 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -80,7 +80,7 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
@@ -762,7 +762,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -771,7 +771,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1fe7ffb2487 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +716,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 5eb7e99ad3e..87219613f0b 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -593,7 +593,7 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..016a5e546dd 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..e946cfb393a 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -190,12 +191,14 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
 
+	flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
 	/* disable syncscan in parallel tid range scan. */
@@ -248,7 +251,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 499ce9ad3db..fb791c7990b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1160,7 +1160,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index dfdde986236..4b50d325612 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,7 +13980,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22888,7 +22888,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23352,7 +23352,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cb3e4f67ea1 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..5b8ca1abf62 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -383,7 +383,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +602,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +666,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..324e2bed22c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c8db357e69f..17bf4976cce 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -95,7 +95,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -790,7 +790,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -856,7 +856,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index bd83e4712b3..88bdf0a52d1 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -113,7 +113,7 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -209,7 +209,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1726,7 +1726,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1790,7 +1790,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cc6b23abee0 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,7 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode, 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..c2d9b7293de 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..994f70989bc 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,7 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid, 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +460,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +494,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d4da0e8dea9..5b2165c267d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7161,7 +7161,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..db102803eb5 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -158,7 +158,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -184,7 +184,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f77a00291bb..c2621dc2fac 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,8 +95,9 @@ typedef struct HeapScanDescData
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
 	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map.
 	 */
 	Buffer		rs_vmbuffer;
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..f1065e30638 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -418,9 +418,12 @@ typedef struct TableAmRoutine
 	 * IndexFetchTableData, which the AM will typically embed in a larger
 	 * structure with additional information.
 	 *
+	 * 'flags' is a bitmask of SO_* flags providing hints from the executor
+	 * about the scan context.
+	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -894,9 +897,9 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
@@ -939,9 +942,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
 }
@@ -957,9 +960,9 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	flags |= SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
 		flags |= SO_ALLOW_STRAT;
@@ -1059,10 +1062,11 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+
+	flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
 
@@ -1139,7 +1143,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1154,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1172,10 +1178,13 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Prepare to fetch tuples from the relation, as needed when fetching tuples
  * for an index scan.
  *
+ * 'flags' is a bitmask of SO_* flags providing hints from the executor about
+ * the scan context.
+ *
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
@@ -1185,7 +1194,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v39-0010-Pass-down-information-on-table-modification-to-s.patch (14.5K, 11-v39-0010-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From bf3e55f226c1f1aacac0b2739a6f42973942c6c4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v39 10/12] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/access/heap/heapam_handler.c  |  1 +
 src/backend/executor/nodeBitmapHeapscan.c |  9 ++++++-
 src/backend/executor/nodeIndexonlyscan.c  | 25 +++++++++++++++---
 src/backend/executor/nodeIndexscan.c      | 32 ++++++++++++++++++++---
 src/backend/executor/nodeSamplescan.c     |  8 +++++-
 src/backend/executor/nodeSeqscan.c        | 26 +++++++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 24 ++++++++++++++---
 src/include/access/heapam.h               |  6 +++++
 src/include/access/tableam.h              |  3 +++
 9 files changed, 119 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 1e950d8e6e5..aec5199b2e6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -87,6 +87,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 324e2bed22c..0f30e6980de 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -144,11 +144,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 17bf4976cce..3fab715f879 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -85,6 +85,12 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -95,7 +101,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys, 0);
+								   node->ioss_NumOrderByKeys,
+								   flags);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -763,6 +770,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -784,13 +792,18 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -831,6 +844,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -850,13 +864,18 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 88bdf0a52d1..6a235ef25ce 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -104,6 +104,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -113,7 +119,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -200,6 +207,12 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -209,7 +222,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1699,6 +1713,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -1720,13 +1735,17 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1765,6 +1784,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -1784,13 +1804,17 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cc6b23abee0..8d36fcda48a 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -292,13 +292,19 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) scanstate->ss.ps.plan)->scanrelid,
+						   scanstate->ss.ps.state->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode, 0);
+									 scanstate->use_pagemode, flags);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index c2d9b7293de..9356973802b 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -368,14 +375,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -405,8 +418,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 994f70989bc..04a75e72fe1 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,10 +242,16 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = 0;
+
+			if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+							   estate->es_modified_relids))
+				flags |= SO_HINT_REL_READ_ONLY;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid, 0);
+												&node->trss_maxtid, flags);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -452,15 +458,21 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
 
 /* ----------------------------------------------------------------
@@ -490,9 +502,15 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c2621dc2fac..978ea90ffa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -131,6 +131,12 @@ typedef struct IndexFetchHeapData
 
 	/* Current heap block's corresponding page in the visibility map */
 	Buffer		xs_vmbuffer;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index f1065e30638..57ce94a386f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
-- 
2.43.0



  [text/x-patch] v39-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.5K, 12-v39-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 31c38dd70fb80b7bc6f2224529b6159a4886f11b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v39 11/12] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              |  3 +-
 src/backend/access/heap/heapam_handler.c      |  6 ++-
 src/backend/access/heap/pruneheap.c           | 46 ++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c          |  2 +-
 src/include/access/heapam.h                   | 12 +++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 6 files changed, 52 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index aec5199b2e6..17d625944e8 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -148,7 +148,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								!hscan->modifies_base_rel);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2543,7 +2544,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9b5a0726f2b..3cdc1a36441 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -213,7 +215,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -233,7 +236,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -315,6 +319,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = 0;
+			if (rel_read_only)
+				params.options = HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -371,6 +377,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -440,9 +447,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -873,21 +879,37 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * consider setting the VM.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1103,7 +1125,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8bd178ae7e6..d2cae77b52a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2008,7 +2008,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 978ea90ffa2..768d442c39c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -97,7 +98,8 @@ typedef struct HeapScanDescData
 	/*
 	 * For sequential scans, bitmap heap scans, TID range scans, and sample
 	 * scans. The current heap block's corresponding page in the visibility
-	 * map.
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
 	 */
 	Buffer		rs_vmbuffer;
 
@@ -129,7 +131,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 
 	/*
@@ -440,7 +446,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v39-0012-Set-pd_prune_xid-on-insert.patch (10.9K, 13-v39-0012-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 8483ddbb7f3226f73262be80031630638e413f37 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v39 12/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to run and set the VM
all-visible after a page is filled with newly inserted tuples the first
time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 40 ++++++++++++-------
 src/backend/access/heap/heapam_xlog.c         | 19 ++++++++-
 src/backend/access/heap/pruneheap.c           | 17 ++++----
 .../modules/index/expected/killtuples.out     |  8 ++--
 4 files changed, 56 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..c199646b25d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3cdc1a36441..7cb9e1e2aac 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -255,7 +255,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1848,16 +1849,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_fix_vm_corruption(prstate, offnum);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..700144d6783 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -329,7 +329,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-17 09:05  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Kirill Reshke @ 2026-03-17 09:05 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Chao Li <[email protected]>; Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, 16 Mar 2026 at 19:53, Melanie Plageman
<[email protected]> wrote:
>
> On Sun, Mar 15, 2026 at 3:10 PM Melanie Plageman
> <[email protected]> wrote:
> >
> > I've pushed a few more of the trivial commits in the set. Attached v38
> > has the remaining patches.
>
> Looks like cfbot wasn't able to rebase v38 on its own for some reason.
> v39 attached.
>
> - Melanie

Hi!

I did take a quick look on v38-v39.

0001 & 0003 looks ok.

> From 788860ded375fcf744201347b9dcbf496070bfb5 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 25 Feb 2026 16:48:19 -0500
> Subject: [PATCH v39 02/12] Add pruning fast path for all-visible and
 all-frozen pages

For the record, does this work with DISABLE_PAGE_SKIPPING? I think we
don't  want the server to "fast-path" in case this option is set by
the user...



-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-17 14:48  Melanie Plageman <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-03-17 14:48 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Chao Li <[email protected]>; Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Tue, Mar 17, 2026 at 5:05 AM Kirill Reshke <[email protected]> wrote:
>
> > From 788860ded375fcf744201347b9dcbf496070bfb5 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <[email protected]>
> > Date: Wed, 25 Feb 2026 16:48:19 -0500
> > Subject: [PATCH v39 02/12] Add pruning fast path for all-visible and
>  all-frozen pages
>
> For the record, does this work with DISABLE_PAGE_SKIPPING? I think we
> don't  want the server to "fast-path" in case this option is set by
> the user...

Hmm. This is a good point. The docs for DISABLE_PAGE_SKIPPING say it
is about fixing visibility map corruption and the fast path does
detect and fix one type of visibility map corruption. It does not
investigate for dead line pointers, though. I suppose
DISABLE_PAGE_SKIPPING would want to also do that kind of VM corruption
detection. Thanks for thinking of that. Attached v40 adds an option to
disable the fast path.

- Melanie


Attachments:

  [text/x-patch] v40-0001-Fix-visibility-map-corruption-in-more-cases.patch (18.7K, 2-v40-0001-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From b02011f54bc2d79a2ac9be199aa6d0495ecaa958 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v40 01/12] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 176 ++++++++++++++++++++++++---
 src/backend/access/heap/vacuumlazy.c |  89 +-------------
 src/include/access/heapam.h          |  12 ++
 3 files changed, 175 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8d9f0694206..52cafb23c6b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -114,6 +114,21 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -168,6 +183,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -175,7 +191,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -209,8 +226,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  *
  * This function may pin *vmbuffer. It's passed by reference so the caller can
- * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
- * responsible for unpinning it.
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
+ * VM corruption during pruning, we will fix it. Caller is responsible for
+ * unpinning *vmbuffer.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -277,6 +295,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -284,14 +312,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -354,6 +375,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->vmbits = visibilitymap_get_status(prstate->relation,
+											   prstate->block,
+											   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -770,6 +797,90 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Helper to fix visibility-related corruption on a heap page and its
+ * corresponding VM page. An all-visible page cannot have dead items nor can
+ * it have tuples that are not visible to all running transactions. It clears
+ * the VM corruption as well as resetting the vmbits used during pruning.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and any dead items must have been discovered under that same lock.
+ * Although we do not hold a lock on the VM buffer, it is pinned, and the heap
+ * buffer is exclusively locked, ensuring that no other backend can update the
+ * VM bits corresponding to this heap page.
+ *
+ * This function makes changes to the VM and, potentially, the heap page, but
+ * it does not need to be done in a critical section.
+ */
+static void
+heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
+{
+	const char *relname = RelationGetRelationName(prstate->relation);
+
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	if (PageIsAllVisible(prstate->page))
+	{
+		/*
+		 * It's possible for the value returned by
+		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+		 * wrong for us to see tuples that appear to not be visible to
+		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
+		 * xmin value never moves backwards, but
+		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
+		 * returns a value that's unnecessarily small, so if we see that
+		 * contradiction it just means that the tuples that we think are not
+		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
+		 * is correct.
+		 *
+		 * However, there should never be LP_DEAD items, dead tuple versions,
+		 * or tuples inserted by an in-progress transaction on a page with
+		 * PD_ALL_VISIBLE set.
+		 */
+		if (prstate->lpdead_items > 0)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("dead line pointer found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+		else
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+
+		/*
+		 * Mark the buffer dirty now in case we make no further changes and
+		 * therefore would not mark it dirty later.
+		 */
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	{
+		/*
+		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
+		 * the page-level bit is clear. However, for vacuum, it's possible
+		 * that the bit got cleared after heap_vac_scan_next_block() was
+		 * called, so we must recheck now that we have the buffer lock before
+		 * concluding that the VM is corrupt.
+		 */
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set"),
+				 errcontext("relation \"%s\", page %u",
+							relname, prstate->block)));
+	}
+
+	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+						VISIBILITYMAP_VALID_BITS);
+	prstate->vmbits = 0;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -830,6 +941,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -973,6 +1088,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->set_all_visible = prstate.set_all_visible;
 	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->vmbits = prstate.vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1295,7 +1411,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1305,6 +1422,13 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be redirected */
@@ -1388,6 +1512,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be marked unused */
@@ -1527,7 +1660,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1542,6 +1676,10 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_fix_vm_corruption(prstate, offnum);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1566,7 +1704,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1632,6 +1771,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 82c5b28e0ad..957322648ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -425,11 +425,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1964,81 +1959,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2070,6 +1990,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2179,18 +2100,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.set_all_visible || !(*has_lpdead_items));
 	Assert(!presult.set_all_frozen || presult.set_all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.set_all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.set_all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2fdc50b865b..c649e5f1980 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -262,6 +262,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -324,6 +330,12 @@ typedef struct PruneFreezeResult
 	bool		set_all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
+	 * pruning. It is cleared if VM corruption is found and corrected.
+	 */
+	uint8		vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v40-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (7.4K, 3-v40-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From a503285e012de12539df384d615675c1e48e5cfd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v40 02/12] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, we exit early. We can't
exit early if vacuum passed DISABLE_PAGE_SKIPPING, though.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 92 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 10 +++
 src/include/access/heapam.h          |  1 +
 3 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 52cafb23c6b..bf740c37f3d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,12 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		vmbits;
 
+	/*
+	 * True if the page can bypass full page inspection during pruning and
+	 * freezing based on its visibility map status and the caller's options.
+	 */
+	bool		fast_path;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -184,6 +190,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
 static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
+static void heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -312,7 +319,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			params.options = 0;
+			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -381,6 +388,16 @@ prune_freeze_setup(PruneFreezeParams *params,
 											   prstate->block,
 											   &prstate->vmbuffer);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can skip pruning and freezing entirely.
+	 * Callers must opt in by setting HEAP_PAGE_PRUNE_ALLOW_FAST_PATH.
+	 */
+	prstate->fast_path = ((prstate->vmbits & VISIBILITYMAP_ALL_FROZEN) ||
+						  ((prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE) &&
+						   !prstate->attempt_freeze)) &&
+		(params->options & HEAP_PAGE_PRUNE_ALLOW_FAST_PATH);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -882,6 +899,68 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 	prstate->vmbits = 0;
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling heap_page_bypass_prune_freeze(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	presult->vmbits = prstate->vmbits;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (PageIsEmpty(page))
+		return;
+
+	presult->hastup = true;
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -945,6 +1024,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
+	/*
+	 * If the visibility map status allows it, bypass pruning and freezing
+	 * entirely. This must be done after fixing any discrepancy between the
+	 * page-level visibility hint and the VM.
+	 */
+	if (prstate.fast_path)
+	{
+		heap_page_bypass_prune_freeze(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 957322648ca..ad7a3290821 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2019,6 +2019,16 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
+	/*
+	 * Allow skipping full inspection of pages that the VM indicates are
+	 * already all-frozen (which may be scanned due to SKIP_PAGES_THRESHOLD).
+	 * However, if DISABLE_PAGE_SKIPPING was specified, we can't trust the VM,
+	 * so we must examine the page to make sure it is truly all-frozen and fix
+	 * it otherwise.
+	 */
+	if (vacrel->skipwithvm)
+		params.options |= HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+
 	heap_page_prune_and_freeze(&params,
 							   &presult,
 							   &vacrel->offnum,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c649e5f1980..0b571d7089f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
-- 
2.43.0



  [text/x-patch] v40-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.5K, 4-v40-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 255fc9aeb721ba96ee3a7b7c3e675a4ee11087d6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v40 03/12] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page;
if it is globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid. This approach will result in examining more tuple
xmins than before; however, the additional cost should not be
significant. And doing so will enable us to set the visibility map on
access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 37 +++++++--------
 src/backend/access/heap/vacuumlazy.c        | 51 +++++++++++++--------
 src/include/access/heapam.h                 |  2 +
 4 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..d70fab3a763 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index bf740c37f3d..c85e4172ee8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1043,6 +1043,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest,
+									 prstate.visibility_cutoff_xid))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1710,29 +1721,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ad7a3290821..7097aa7b772 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -461,13 +461,13 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2064,13 +2064,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
 	}
 #endif
 
@@ -2826,7 +2823,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3587,14 +3584,14 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3615,7 +3612,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3634,7 +3631,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3715,7 +3712,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3724,16 +3721,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3762,6 +3760,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b571d7089f..9312886ad4b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -480,6 +480,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v40-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (14.9K, 5-v40-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From a1d768a8cea8ac13e250188ec96c01d98acda94a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v40 04/12] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 127 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++----
 2 files changed, 65 insertions(+), 92 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c85e4172ee8..d276770b9b4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*
 	 * True if the page can bypass full page inspection during pruning and
 	 * freezing based on its visibility map status and the caller's options.
@@ -166,11 +169,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -180,7 +178,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /* Local functions */
@@ -451,53 +448,35 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Other callers must initialize prstate.set_all_frozen to false,
+	 * since we will not call heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * We only consider opportunistic freezing if the page would become
+	 * all-frozen, or if it would be all-frozen except for dead tuples that
+	 * VACUUM will remove.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear
+	 * set_all_visible and set_all_frozen when we encounter LP_DEAD items.
+	 * Instead, we correct them after deciding whether to freeze, but before
+	 * updating the VM, to avoid setting the VM bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze;
 }
 
 /*
@@ -727,7 +706,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -982,9 +960,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1049,9 +1026,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidMaybeRunning(prstate.vistest,
-									 prstate.visibility_cutoff_xid))
+									 prstate.newest_live_xid))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	/*
@@ -1202,7 +1179,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1662,6 +1639,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1709,32 +1687,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7097aa7b772..4d52de1a96c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,7 +463,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -471,7 +471,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2799,7 +2799,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2825,14 +2825,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2873,7 +2873,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3586,7 +3586,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 
@@ -3594,7 +3594,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3617,7 +3617,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3635,7 +3635,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3645,7 +3645,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3734,9 +3734,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3766,8 +3766,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *newest_live_xid))
 	{
 		all_visible = false;
 		*all_frozen = false;
-- 
2.43.0



  [text/x-patch] v40-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (24.7K, 6-v40-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 05dfe8841e4a90dc595775863d58bacce996d70b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v40 05/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 263 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 107 +----------
 src/include/access/heapam.h          |  38 ++--
 3 files changed, 208 insertions(+), 200 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d276770b9b4..633d44adb03 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
-	uint8		vmbits;
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -163,21 +182,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /* Local functions */
@@ -215,7 +219,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -381,17 +385,18 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
-	prstate->vmbits = visibilitymap_get_status(prstate->relation,
-											   prstate->block,
-											   &prstate->vmbuffer);
+	prstate->new_vmbits = 0;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
 
 	/*
 	 * If the page is already all-frozen, or already all-visible when freezing
 	 * is not being attempted, we can skip pruning and freezing entirely.
 	 * Callers must opt in by setting HEAP_PAGE_PRUNE_ALLOW_FAST_PATH.
 	 */
-	prstate->fast_path = ((prstate->vmbits & VISIBILITYMAP_ALL_FROZEN) ||
-						  ((prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE) &&
+	prstate->fast_path = ((prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN) ||
+						  ((prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE) &&
 						   !prstate->attempt_freeze)) &&
 		(params->options & HEAP_PAGE_PRUNE_ALLOW_FAST_PATH);
 
@@ -856,7 +861,7 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 		PageClearAllVisible(prstate->page);
 		MarkBufferDirtyHint(prstate->buffer, true);
 	}
-	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	else if (prstate->old_vmbits & VISIBILITYMAP_VALID_BITS)
 	{
 		/*
 		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
@@ -874,7 +879,43 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 
 	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
 						VISIBILITYMAP_VALID_BITS);
-	prstate->vmbits = 0;
+	prstate->old_vmbits = 0;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * consider setting the VM.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
 }
 
 /*
@@ -903,15 +944,13 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
 	Page		page = prstate->page;
 
-	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
 			!prstate->attempt_freeze));
 
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->vmbits = prstate->vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -941,7 +980,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -956,12 +996,10 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also passed and if the page is
+ * found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -989,15 +1027,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
-	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
@@ -1076,6 +1116,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1097,14 +1161,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1118,6 +1185,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1125,29 +1213,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1157,33 +1228,67 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.set_all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->vmbits = prstate.vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->new_all_visible_pages = 0;
+	presult->new_all_frozen_pages = 0;
+	presult->new_all_visible_frozen_pages = 0;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->new_all_visible_pages = 1;
+			if (prstate.set_all_frozen)
+				presult->new_all_visible_frozen_pages = 1;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->new_all_frozen_pages = 1;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4d52de1a96c..5ea96087fad 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -459,13 +459,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
@@ -1996,8 +1989,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2048,29 +2039,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2091,6 +2059,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	vacrel->new_all_visible_pages += presult.new_all_visible_pages;
+	vacrel->new_all_visible_all_frozen_pages += presult.new_all_visible_frozen_pages;
+	vacrel->new_all_frozen_pages += presult.new_all_frozen_pages;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.new_all_visible_frozen_pages > 0 ||
+		presult.new_all_frozen_pages > 0;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2104,71 +2080,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3582,7 +3493,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9312886ad4b..4ce63990326 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -265,7 +265,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -281,8 +282,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -316,26 +316,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
-	 * pruning. It is cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		vmbits;
+	BlockNumber new_all_visible_pages;
+	BlockNumber new_all_visible_frozen_pages;
+	BlockNumber new_all_frozen_pages;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -472,7 +458,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
-
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v40-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 7-v40-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From c47a6270a0a0045347cdb4597b957798d21db4aa Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v40 06/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5ea96087fad..9bfe3c545ff 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1903,9 +1903,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1923,13 +1926,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v40-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.0K, 8-v40-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 181c83f0652bfebe0db2f11983ad08b52c8c780b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v40 07/12] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 110 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 372 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 8a67bfa1aff..d9042e1f91d 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_freeze() for
+	 * more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..044f385e477 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8886,50 +8886,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1da774c1536..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1369,9 +1232,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 633d44adb03..ba00521d834 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1202,8 +1202,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9bfe3c545ff..93a4437f29b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1926,11 +1926,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2804,9 +2804,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..f1da52b2069 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -222,112 +221,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..66ed51a8aa1 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -477,12 +477,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 52f8603a7be..3102c61125e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4409,7 +4409,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v40-0008-Track-which-relations-are-modified-by-a-query.patch (5.8K, 9-v40-0008-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 04b03c1ec3abcee75e464fef994b482df41b35f4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v40 08/12] Track which relations are modified by a query

Save the relids of modified relations in a bitmap in the executor state.
A later commit will pass this information down to scan nodes to control
whether or not on-access pruning is allowed to set the visibility map.
Setting the visibility map during a scan is counterproductive if the
query is going to modify the page immediately after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this bitmap is only used as a hint
to avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c  | 18 ++++++++++++++++++
 src/backend/executor/execUtils.c | 31 +++++++++++++++++++++++++++++++
 src/include/executor/executor.h  |  3 +++
 src/include/nodes/execnodes.h    |  6 ++++++
 4 files changed, 58 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..8d22b6db867 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -922,6 +922,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation may be modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
@@ -992,6 +996,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
@@ -3048,6 +3056,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_output_cid = parentestate->es_output_cid;
 	rcestate->es_queryEnv = parentestate->es_queryEnv;
 
+	/*
+	 * Use a deep copy to avoid stale pointers since bms_add_member() may
+	 * reallocate the bitmap.
+	 */
+	rcestate->es_modified_relids = bms_copy(parentestate->es_modified_relids);
+
 	/*
 	 * ResultRelInfos needed by subplans are initialized from scratch when the
 	 * subplans themselves are initialized.
@@ -3180,6 +3194,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 */
 	epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(rcestate);
+#endif
+
 	MemoryContextSwitchTo(oldcontext);
 }
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9886ab06b69..7dfa95c2cbe 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -125,6 +125,8 @@ CreateExecutorState(void)
 	estate->es_part_prune_results = NIL;
 	estate->es_unpruned_relids = NULL;
 
+	estate->es_modified_relids = NULL;
+
 	estate->es_junkFilter = NULL;
 
 	estate->es_output_cid = (CommandId) 0;
@@ -873,6 +875,33 @@ ExecGetRangeTableRelation(EState *estate, Index rti, bool isResultRel)
 	return rel;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Assert that es_modified_relids includes all potentially modified RT
+ * indexes.
+ */
+void
+CrossCheckModifiedRelids(EState *estate)
+{
+	Bitmapset  *expected = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = lfirst_node(ResultRelInfo, lc);
+
+		expected = bms_add_member(expected, rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (Index rti = 1; rti <= estate->es_range_table_size; rti++)
+			if (estate->es_rowmarks[rti - 1] != NULL)
+				expected = bms_add_member(expected, rti);
+	}
+	Assert(bms_is_subset(expected, estate->es_modified_relids));
+}
+#endif
+
 /*
  * ExecInitResultRelation
  *		Open relation given by the passed-in RT index and fill its
@@ -898,6 +927,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 064df01811e..080cfdac48e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -707,6 +707,9 @@ extern Relation ExecGetRangeTableRelation(EState *estate, Index rti,
 										  bool isResultRel);
 extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 								   Index rti);
+#ifdef USE_ASSERT_CHECKING
+extern void CrossCheckModifiedRelids(EState *estate);
+#endif
 
 extern int	executor_errposition(EState *estate, int location);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0716c5a9aed..d2f4f8ea748 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -688,6 +688,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query through a
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v40-0009-Thread-flags-through-begin-scan-APIs.patch (28.1K, 10-v40-0009-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 05d736fb5b0600effede5e030d5b929274dabe2c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:17 -0500
Subject: [PATCH v40 09/12] Thread flags through begin-scan APIs

Add a flags parameter to the index_fetch_begin() table AM callback and
the begin-scan helpers so the executor can pass context for building
scan descriptors. This introduces an extension point for follow-up work
to mark relations as read-only for the current query, without changing
behavior in this patch.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  6 ++--
 src/backend/access/index/genam.c          |  4 +--
 src/backend/access/index/indexam.c        |  8 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 13 +++++----
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +++---
 src/backend/commands/typecmds.c           |  4 +--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++---
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  6 ++--
 src/backend/executor/nodeIndexscan.c      |  8 +++---
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  6 ++--
 src/backend/executor/nodeTidrangescan.c   |  6 ++--
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  5 ++--
 src/include/access/heapam.h               |  5 ++--
 src/include/access/tableam.h              | 35 ++++++++++++++---------
 25 files changed, 81 insertions(+), 65 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..0556e9f7b88 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..b25e814a996 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..555b16771e9 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 253a735b6c1..1e950d8e6e5 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -80,7 +80,7 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
@@ -762,7 +762,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -771,7 +771,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1fe7ffb2487 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +716,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 5eb7e99ad3e..87219613f0b 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -593,7 +593,7 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..016a5e546dd 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..e946cfb393a 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -190,12 +191,14 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
 
+	flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
 	/* disable syncscan in parallel tid range scan. */
@@ -248,7 +251,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 499ce9ad3db..fb791c7990b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1160,7 +1160,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 67e42e5df29..cc2ec9393a8 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,7 +13980,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22881,7 +22881,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23345,7 +23345,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cb3e4f67ea1 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..5b8ca1abf62 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -383,7 +383,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +602,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +666,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..324e2bed22c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c8db357e69f..17bf4976cce 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -95,7 +95,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -790,7 +790,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -856,7 +856,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index bd83e4712b3..88bdf0a52d1 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -113,7 +113,7 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -209,7 +209,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1726,7 +1726,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1790,7 +1790,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cc6b23abee0 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,7 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode, 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..c2d9b7293de 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..994f70989bc 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,7 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid, 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +460,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +494,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d4da0e8dea9..5b2165c267d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7161,7 +7161,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..db102803eb5 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -158,7 +158,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -184,7 +184,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4ce63990326..3820bbd7f9f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -96,8 +96,9 @@ typedef struct HeapScanDescData
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
 	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map.
 	 */
 	Buffer		rs_vmbuffer;
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..f1065e30638 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -418,9 +418,12 @@ typedef struct TableAmRoutine
 	 * IndexFetchTableData, which the AM will typically embed in a larger
 	 * structure with additional information.
 	 *
+	 * 'flags' is a bitmask of SO_* flags providing hints from the executor
+	 * about the scan context.
+	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -894,9 +897,9 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
@@ -939,9 +942,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
 }
@@ -957,9 +960,9 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	flags |= SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
 		flags |= SO_ALLOW_STRAT;
@@ -1059,10 +1062,11 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+
+	flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
 
@@ -1139,7 +1143,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1154,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1172,10 +1178,13 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Prepare to fetch tuples from the relation, as needed when fetching tuples
  * for an index scan.
  *
+ * 'flags' is a bitmask of SO_* flags providing hints from the executor about
+ * the scan context.
+ *
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
@@ -1185,7 +1194,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v40-0010-Pass-down-information-on-table-modification-to-s.patch (14.5K, 11-v40-0010-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 7790c8177ba3aa8a8bd1a216ea77fdfd42efc1bf Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v40 10/12] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/access/heap/heapam_handler.c  |  1 +
 src/backend/executor/nodeBitmapHeapscan.c |  9 ++++++-
 src/backend/executor/nodeIndexonlyscan.c  | 25 +++++++++++++++---
 src/backend/executor/nodeIndexscan.c      | 32 ++++++++++++++++++++---
 src/backend/executor/nodeSamplescan.c     |  8 +++++-
 src/backend/executor/nodeSeqscan.c        | 26 +++++++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 24 ++++++++++++++---
 src/include/access/heapam.h               |  6 +++++
 src/include/access/tableam.h              |  3 +++
 9 files changed, 119 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 1e950d8e6e5..aec5199b2e6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -87,6 +87,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 324e2bed22c..0f30e6980de 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -144,11 +144,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 17bf4976cce..3fab715f879 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -85,6 +85,12 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -95,7 +101,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys, 0);
+								   node->ioss_NumOrderByKeys,
+								   flags);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -763,6 +770,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -784,13 +792,18 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -831,6 +844,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -850,13 +864,18 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 88bdf0a52d1..6a235ef25ce 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -104,6 +104,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -113,7 +119,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -200,6 +207,12 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -209,7 +222,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1699,6 +1713,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -1720,13 +1735,17 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1765,6 +1784,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -1784,13 +1804,17 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cc6b23abee0..8d36fcda48a 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -292,13 +292,19 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) scanstate->ss.ps.plan)->scanrelid,
+						   scanstate->ss.ps.state->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode, 0);
+									 scanstate->use_pagemode, flags);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index c2d9b7293de..9356973802b 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -368,14 +375,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -405,8 +418,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 994f70989bc..04a75e72fe1 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,10 +242,16 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = 0;
+
+			if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+							   estate->es_modified_relids))
+				flags |= SO_HINT_REL_READ_ONLY;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid, 0);
+												&node->trss_maxtid, flags);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -452,15 +458,21 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
 
 /* ----------------------------------------------------------------
@@ -490,9 +502,15 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3820bbd7f9f..1a7306e2935 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -132,6 +132,12 @@ typedef struct IndexFetchHeapData
 
 	/* Current heap block's corresponding page in the visibility map */
 	Buffer		xs_vmbuffer;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index f1065e30638..57ce94a386f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
-- 
2.43.0



  [text/x-patch] v40-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.5K, 12-v40-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 0a16dad7a4ebe224f35629a39619d0feb03f03a3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v40 11/12] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              |  3 +-
 src/backend/access/heap/heapam_handler.c      |  6 ++-
 src/backend/access/heap/pruneheap.c           | 46 ++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c          |  2 +-
 src/include/access/heapam.h                   | 12 +++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 6 files changed, 52 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index aec5199b2e6..17d625944e8 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -148,7 +148,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								!hscan->modifies_base_rel);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2543,7 +2544,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ba00521d834..4475457fdde 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -219,7 +221,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -239,7 +242,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -321,6 +325,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -377,6 +383,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -456,9 +463,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -889,21 +895,37 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * consider setting the VM.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1118,7 +1140,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 93a4437f29b..1ddd31c7ead 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2008,7 +2008,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 1a7306e2935..e9617b1e666 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -98,7 +99,8 @@ typedef struct HeapScanDescData
 	/*
 	 * For sequential scans, bitmap heap scans, TID range scans, and sample
 	 * scans. The current heap block's corresponding page in the visibility
-	 * map.
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
 	 */
 	Buffer		rs_vmbuffer;
 
@@ -130,7 +132,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 
 	/*
@@ -441,7 +447,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v40-0012-Set-pd_prune_xid-on-insert.patch (10.9K, 13-v40-0012-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From e4c7112d49e650f59dab834d3db6007c69f34f1a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v40 12/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to run and set the VM
all-visible after a page is filled with newly inserted tuples the first
time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 40 ++++++++++++-------
 src/backend/access/heap/heapam_xlog.c         | 19 ++++++++-
 src/backend/access/heap/pruneheap.c           | 17 ++++----
 .../modules/index/expected/killtuples.out     |  8 ++--
 4 files changed, 56 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..c199646b25d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4475457fdde..2c49bc72f4b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -261,7 +261,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1863,16 +1864,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_fix_vm_corruption(prstate, offnum);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..700144d6783 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -329,7 +329,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-18 17:14  Andres Freund <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Andres Freund @ 2026-03-18 17:14 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hi,

On 2026-03-17 10:48:55 -0400, Melanie Plageman wrote:
> @@ -277,6 +295,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
>  		{
>  			OffsetNumber dummy_off_loc;
>  			PruneFreezeResult presult;
> +			PruneFreezeParams params;
> +
> +			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);

We also do a BufferGetBlockNumber(buffer) in prune_freeze_setup().  It irks me
a bit to do that twice, but I don't see a non-ugly way to avoid that.


> +			params.relation = relation;
> +			params.buffer = buffer;
> +			params.vmbuffer = *vmbuffer;
> +			params.reason = PRUNE_ON_ACCESS;
> +			params.vistest = vistest;
> +			params.cutoffs = NULL;
>  

>  			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
> @@ -284,14 +312,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
>  			 * cannot safely determine that during on-access pruning with the
>  			 * current implementation.
>  			 */
> -			PruneFreezeParams params = {
> -				.relation = relation,
> -				.buffer = buffer,
> -				.reason = PRUNE_ON_ACCESS,
> -				.options = 0,
> -				.vistest = vistest,
> -				.cutoffs = NULL,
> -			};
> +			params.options = 0;
>  
>  			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
>  									   NULL, NULL);

Why does this change the way the PruneFreezeParams variable is defined?  I
don't really mind, it's just a bit confusing.



> +/*
> + * Helper to fix visibility-related corruption on a heap page and its
> + * corresponding VM page. An all-visible page cannot have dead items nor can
> + * it have tuples that are not visible to all running transactions. It clears
> + * the VM corruption as well as resetting the vmbits used during pruning.

So this is now only called when we already know there's corruption?  I think
that could be clearer in the comments.


Seems a bit odd that the function then figures out what it should do from the
page & VM contents, given that the caller already needs to have known that
something is wrong?


> + * This function must be called while holding an exclusive lock on the heap
> + * buffer, and any dead items must have been discovered under that same lock.
> + * Although we do not hold a lock on the VM buffer, it is pinned, and the heap
> + * buffer is exclusively locked, ensuring that no other backend can update the
> + * VM bits corresponding to this heap page.
> + *
> + * This function makes changes to the VM and, potentially, the heap page, but
> + * it does not need to be done in a critical section.
> + */
> +static void
> +heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
> +{
> +	const char *relname = RelationGetRelationName(prstate->relation);
> +
> +	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
> +
> +	if (PageIsAllVisible(prstate->page))
> +	{
> +		/*
> +		 * It's possible for the value returned by
> +		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
> +		 * wrong for us to see tuples that appear to not be visible to
> +		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
> +		 * xmin value never moves backwards, but
> +		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
> +		 * returns a value that's unnecessarily small, so if we see that
> +		 * contradiction it just means that the tuples that we think are not
> +		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
> +		 * is correct.
> +		 *
> +		 * However, there should never be LP_DEAD items, dead tuple versions,
> +		 * or tuples inserted by an in-progress transaction on a page with
> +		 * PD_ALL_VISIBLE set.
> +		 */
> +		if (prstate->lpdead_items > 0)
> +		{
> +			ereport(WARNING,
> +					(errcode(ERRCODE_DATA_CORRUPTED),
> +					 errmsg("dead line pointer found on page marked all-visible"),
> +					 errcontext("relation \"%s\", page %u, tuple %u",
> +								relname, prstate->block, offnum)));
> +		}
> +		else
> +		{
> +			ereport(WARNING,
> +					(errcode(ERRCODE_DATA_CORRUPTED),
> +					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
> +					 errcontext("relation \"%s\", page %u, tuple %u",
> +								relname, prstate->block, offnum)));
> +		}

Wait, why are we now WARNING about the PageIsAllVisible() &&
prstate->lpdead_items == 0 case? Seems to run flatly counter to the comment
above about GetOldestNonRemovableTransactionId() going backward?


> +		/*
> +		 * Mark the buffer dirty now in case we make no further changes and
> +		 * therefore would not mark it dirty later.
> +		 */
> +		PageClearAllVisible(prstate->page);
> +		MarkBufferDirtyHint(prstate->buffer, true);
> +	}
> +	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
> +	{
> +		/*
> +		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
> +		 * the page-level bit is clear. However, for vacuum, it's possible
> +		 * that the bit got cleared after heap_vac_scan_next_block() was
> +		 * called, so we must recheck now that we have the buffer lock before
> +		 * concluding that the VM is corrupt.
> +		 */
> +		ereport(WARNING,
> +				(errcode(ERRCODE_DATA_CORRUPTED),
> +				 errmsg("page is not marked all-visible but visibility map bit is set"),
> +				 errcontext("relation \"%s\", page %u",
> +							relname, prstate->block)));
> +	}
> +
> +	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
> +						VISIBILITYMAP_VALID_BITS);
> +	prstate->vmbits = 0;

So we can end up clearing the VM without emitting any warning?


>  /*
>   * Prune and repair fragmentation and potentially freeze tuples on the
> @@ -830,6 +941,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>  					   new_relfrozen_xid, new_relmin_mxid,
>  					   presult, &prstate);
>  
> +	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
> +		!PageIsAllVisible(prstate.page))
> +		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
> +
>  	/*
>  	 * Examine all line pointers and tuple visibility information to determine
>  	 * which line pointers should change state and which tuples may be frozen.

Feels like there should be an explanation here for why we are clearing the VM?




> From a503285e012de12539df384d615675c1e48e5cfd Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 25 Feb 2026 16:48:19 -0500
> Subject: [PATCH v40 02/12] Add pruning fast path for all-visible and
>  all-frozen pages
> 
> Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
> heap_page_prune_and_freeze() can be invoked for pages with no pruning or
> freezing work. To avoid this, if a page is already all-frozen or it is
> all-visible and no freezing will be attempted, we exit early. We can't
> exit early if vacuum passed DISABLE_PAGE_SKIPPING, though.
> 



> +static void
> +heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
> +{
> +	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
> +	Page		page = prstate->page;
> +
> +	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
> +		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
> +			!prstate->attempt_freeze));
> +
> +	/* We'll fill in presult for the caller */
> +	memset(presult, 0, sizeof(PruneFreezeResult));
> +
> +	presult->vmbits = prstate->vmbits;
> +
> +	/* Clear any stale prune hint */
> +	if (TransactionIdIsValid(PageGetPruneXid(page)))
> +	{
> +		PageClearPrunable(page);
> +		MarkBufferDirtyHint(prstate->buffer, true);
> +	}
> +
> +	if (PageIsEmpty(page))
> +		return;
> +
> +	presult->hastup = true;

Is that actually a given? Couldn't the page consist solely out of unused
items? That'd make PageIsEmpty() return false, but should still allow
truncation.





> From 255fc9aeb721ba96ee3a7b7c3e675a4ee11087d6 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 17 Dec 2025 16:51:05 -0500
> Subject: [PATCH v40 03/12] Use GlobalVisState in vacuum to determine page
>  level visibility
> 
> During vacuum's first and third phases, we examine tuples' visibility
> to determine if we can set the page all-visible in the visibility map.
> 
> Previously, this check compared tuple xmins against a single XID chosen at
> the start of vacuum (OldestXmin). We now use GlobalVisState, which also
> enables future work to set the VM during on-access pruning, since ordinary
> queries have access to GlobalVisState but not OldestXmin.
> 
> This also benefits vacuum: in some cases, GlobalVisState may advance
> during a vacuum, allowing more pages to become considered all-visible.
> And, in the future, we could easily add a heuristic to update
> GlobalVisState more frequently during vacuums of large tables.
> 
> OldestXmin is still used for freezing and as a backstop to ensure we
> don't freeze a dead tuple that wasn't yet prunable according to
> GlobalVisState in the rare occurrences where GlobalVisState moves
> backwards.

> Because comparing a transaction ID against GlobalVisState is more
> expensive than comparing against a single XID, we defer this check until
> after scanning all tuples on the page. Therefore, we perform the
> GlobalVisState check only once per page. This is safe because
> visibility_cutoff_xid records the newest live xmin on the page;
> if it is globally visible, then the entire page is all-visible.
> 
> Using GlobalVisState means on-access pruning can also maintain
> visibility_cutoff_xid. This approach will result in examining more tuple
> xmins than before; however, the additional cost should not be
> significant. And doing so will enable us to set the visibility map on
> access in the future.


I wish there were a good way to trigger errors if visibility_cutoff_xid were
ever read after prstate->set_all_frozen is set to false... But I guess that'll
be moot in a few commits.



> diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
> index bf740c37f3d..c85e4172ee8 100644
> --- a/src/backend/access/heap/pruneheap.c
> +++ b/src/backend/access/heap/pruneheap.c
> @@ -1043,6 +1043,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>  	 */
>  	prune_freeze_plan(&prstate, off_loc);
>  
> +	/*
> +	 * After processing all the live tuples on the page, if the newest xmin
> +	 * amongst them may be considered running by any snapshot, the page cannot
> +	 * be all-visible.
> +	 */
> +	if (prstate.set_all_visible &&
> +		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
> +		GlobalVisTestXidMaybeRunning(prstate.vistest,
> +									 prstate.visibility_cutoff_xid))
> +		prstate.set_all_visible = prstate.set_all_frozen = false;
> +

So the docs for prstate.visibility_cutoff_xid say:

	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
	 * The caller can use it as the conflict horizon, when setting the VM
	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
	 * true.

But here we look at it without checking that we froze some tuples.  I guess
the comment is outdated?



> @@ -3615,7 +3612,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
>   * Returns true if the page is all-visible other than the provided
>   * deadoffsets and false otherwise.
>   *
> - * OldestXmin is used to determine visibility.
> + * vistest is used to determine visibility.
>   *
>   * Output parameters:
>   *

Could the "going backward" thing possibly trigger a spurious assert in

        Assert(heap_page_is_all_visible(vacrel->rel, buf,
                                        vacrel->vistest, &debug_all_frozen,
                                        &debug_cutoff, &vacrel->offnum));



> From a1d768a8cea8ac13e250188ec96c01d98acda94a Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Sat, 28 Feb 2026 16:06:51 -0500
> Subject: [PATCH v40 04/12] Keep newest live XID up-to-date even if page not
>  all-visible

I guess I'd have expected 03 and 04 to be swapped... But whatever.


> +	 * Currently, only VACUUM performs freezing, but other callers may in the
> +	 * future. Other callers must initialize prstate.set_all_frozen to false,
> +	 * since we will not call heap_prepare_freeze_tuple() for each tuple.

What does it mean that other callers need to "initialize
prstate.set_all_frozen to false"? It's not like they can do that, because
prstate is defined in heap_page_prune_and_freeze().


> +	 * We only consider opportunistic freezing if the page would become
> +	 * all-frozen, or if it would be all-frozen except for dead tuples that
> +	 * VACUUM will remove.

It kinda feels like "opportunistic freezing" is not defined at this point.  It
wasn't super clear before either, but there was at least this:

-     * In addition to telling the caller whether it can set the VM bit, we
-     * also use 'set_all_visible' and 'set_all_frozen' for our own
-     * decision-making. If the whole page would become frozen, we consider
-     * opportunistically freezing tuples.  We will not be able to freeze the
-     * whole page if there are tuples present that are not visible to everyone
-     * or if there are dead tuples which are not yet removable.  However, dead
-     * tuples which will be removed by the end of vacuuming should not
-     * preclude us from opportunistically freezing.  Because of that, we do

Which seems to provide a bit more explanation than "We only consider
opportunistic freezing"...


> From 05dfe8841e4a90dc595775863d58bacce996d70b Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Tue, 2 Dec 2025 15:07:42 -0500
> Subject: [PATCH v40 05/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
>  prune/freeze

"Eliminate" kinda makes me think this just removes WAL logging for
visibilitymap sets or such. Perhaps consider rephrasing it as something like
"WAL log setting VM as part of XLOG_HEAP2_PRUNE_*"


> This change applies only to vacuum phase I, not to pruning performed
> during normal page access.

Maybe + "For now this ..."


> @@ -215,7 +219,7 @@ static void page_verify_redirects(Page page);
>  
>  static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
>  								  PruneState *prstate);
> -
> +static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
>  
>  /*
>   * Optionally prune and repair fragmentation in the specified page.

Previously there were two newlines between the declarations and code, now only
one. Intentional?




> +/*
> + * Decide whether to set the visibility map bits (all-visible and all-frozen)
> + * for heap_blk using information from the PruneState and VM.
> + *
> + * This function does not actually set the VM bits or page-level visibility
> + * hint, PD_ALL_VISIBLE.
> + *
> + * Returns true if one or both VM bits should be set and false otherwise.
> + */
> +static bool
> +heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
> +{
> +	/*
> +	 * Though on-access pruning maintains prstate->set_all_visible, we don't
> +	 * consider setting the VM.
> +	 */
> +	if (reason == PRUNE_ON_ACCESS)
> +		return false;

Nitpick^2: We kind of are considering based on this comment :).  I'd just
s/consider setting/set/, maybe with a +for now.


> @@ -956,12 +996,10 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
>   * tuples if it's required in order to advance relfrozenxid / relminmxid, or
>   * if it's considered advantageous for overall system performance to do so
>   * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
> - * 'new_relmin_mxid' arguments are required when freezing.  When
> - * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
> - * presult->set_all_visible and presult->set_all_frozen after determining
> - * whether or not to opportunistically freeze, to indicate if the VM bits can
> - * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
> - * option is not passed.
> + * 'new_relmin_mxid' arguments are required when freezing.
> + *
> + * A vmbuffer corresponding to the heap page is also passed and if the page is
> + * found to be all-visible/all-frozen, we will set it in the VM.
>   *
>   * presult contains output parameters needed by callers, such as the number of
>   * tuples removed and the offsets of dead items on the page after pruning.
> @@ -989,15 +1027,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>  	bool		do_freeze;
>  	bool		do_prune;
>  	bool		do_hint_prune;
> +	bool		do_set_vm;
>  	bool		did_tuple_hint_fpi;
>  	int64		fpi_before = pgWalUsage.wal_fpi;
> +	TransactionId conflict_xid = InvalidTransactionId;
>  
>  	/* Initialize prstate */
>  	prune_freeze_setup(params,
>  					   new_relfrozen_xid, new_relmin_mxid,
>  					   presult, &prstate);
>  
> -	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
> +	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
>  		!PageIsAllVisible(prstate.page))
>  		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);

There are so many changes related to s/vmbits/old_vmbits/. How about naming it
old_vmbits from the start? That'll make this commit a lot less noisy.



> @@ -1076,6 +1116,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>  		prstate.set_all_visible = prstate.set_all_frozen = false;
>  
>  	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
> +	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));

Why didn't we have this assert earlier?


> +	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);

Most of the other heap_page_prune_and_freeze() helpers are named
heap_prune_xyz(), why not follow that here?

I guess this holds for a few other helpers added in earlier commits
too. E.g. heap_page_bypass_prune_freeze() should probably be
heap_prune_bypass_prune_freeze() or such.


> +	/*
> +	 * new_vmbits should be 0 regardless of whether or not the page is
> +	 * all-visible if we do not intend to set the VM.
> +	 */
> +	Assert(do_set_vm || prstate.new_vmbits == 0);
> +
> +	/*
> +	 * The snapshot conflict horizon for the whole record is the most
> +	 * conservative (newest) horizon required by any change in the record.
> +	 */
> +	if (do_set_vm)
> +		conflict_xid = prstate.newest_live_xid;
> +	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
> +		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
> +	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
> +		conflict_xid = prstate.latest_xid_removed;

I guess I'd personally move the initialization of conflict_xid to
InvalidTransactionId to just before the if, to make it clearer where we start
from if !do_set_vm.


> @@ -1097,14 +1161,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>  
>  		/*
>  		 * If that's all we had to do to the page, this is a non-WAL-logged
> -		 * hint.  If we are going to freeze or prune the page, we will mark
> -		 * the buffer dirty below.
> +		 * hint.  If we are going to freeze or prune the page or set
> +		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
> +		 *
> +		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
> +		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
>  		 */
> -		if (!do_freeze && !do_prune)
> +		if (!do_freeze && !do_prune && !do_set_vm)
>  			MarkBufferDirtyHint(prstate.buffer, true);
>  	}

This block is gated by if (do_hint_prune) which is computed as:

	/*
	 * Even if we don't prune anything, if we found a new value for the
	 * pd_prune_xid field or the page was marked full, we will update the hint
	 * bit.
	 */
	do_hint_prune = PageGetPruneXid(prstate.page) != prstate.new_prune_xid ||
		PageIsFull(prstate.page);

It's not really related to this change, but I'm just confused a bit by the
"|| PageIsFull(prstate.page)". What is that about? Why do we want to mark the
buffer DirtyHint if the page is full? It very well might already have been
marked as such, no?



> -	if (do_prune || do_freeze)
> +	if (do_prune || do_freeze || do_set_vm)
>  	{
>  		/* Apply the planned item changes and repair page fragmentation. */
>  		if (do_prune)
> @@ -1118,6 +1185,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>  		if (do_freeze)
>  			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
>  
> +		/* Set the visibility map and page visibility hint */
> +		if (do_set_vm)
> +		{
> +			/*
> +			 * While it is valid for PD_ALL_VISIBLE to be set when the
> +			 * corresponding VM bit is clear, we strongly prefer to keep them
> +			 * in sync.
> +			 *
> +			 * The heap buffer must be marked dirty before adding it to the
> +			 * WAL chain when setting the VM. We don't worry about
> +			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
> +			 * already set, though. It is extremely rare to have a clean heap
> +			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
> +			 * so there is no point in optimizing it.
> +			 */
> +			PageSetAllVisible(prstate.page);
> +			PageClearPrunable(prstate.page);

Idle thought, not to be acted on now: Eventually it could make sense to not do
PageClearPrunable() if we are not marking the page frozen, but instead replace
the prune xid with something triggering on-access pruning when freezing is
reasonable.


> +	/*
> +	 * During its second pass over the heap, VACUUM calls
> +	 * heap_page_would_be_all_visible() to determine whether a page is
> +	 * all-visible and all-frozen. The logic here is similar. After completing
> +	 * pruning and freezing, use an assertion to verify that our results
> +	 * remain consistent with heap_page_would_be_all_visible().
> +	 */
> +#ifdef USE_ASSERT_CHECKING
> +	if (prstate.set_all_visible)
> +	{
> +		TransactionId debug_cutoff;
> +		bool		debug_all_frozen;
> +
> +		Assert(prstate.lpdead_items == 0);
> +
> +		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
> +										prstate.vistest,
> +										&debug_all_frozen,
> +										&debug_cutoff, off_loc));
> +
> +		/*
> +		 * It's possible the page is composed entirely of frozen tuples but is
> +		 * not set all-frozen in the VM and did not pass
> +		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
> +		 * heap_page_is_all_visible() finds the page completely frozen, even
> +		 * though prstate.set_all_frozen is false.
> +		 */
> +		Assert(!prstate.set_all_frozen || debug_all_frozen);

Seems like we could verify that debug_cutoff isn't newer than conflict_xid?


> +	}
> +#endif

Hm.  I guess aborting after we did incorrect pruning/freezing/VMing is better
than not, but it'd be even better if we did it before corrupting things. But I
guess it'd be not trivial to add something like the debug_cutoff assertion I
suggest above, when freezing of tuples is only executed after
heap_page_is_all_visible() (for dead tuples heap_page_would_be_all_visible()
already has provisions).

It's probably more a theoretical concern than a real worry.

> +	presult->new_all_visible_pages = 0;
> +	presult->new_all_frozen_pages = 0;
> +	presult->new_all_visible_frozen_pages = 0;

Isn't it odd to talk about pages here? Given that heap_page_prune_and_freeze()
only ever operates on exactly one page.  Is that just so you can do

> +	vacrel->new_all_visible_pages += presult.new_all_visible_pages;

etc?


> +	if (do_set_vm)
> +	{
> +		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
> +		{
> +			presult->new_all_visible_pages = 1;
> +			if (prstate.set_all_frozen)
> +				presult->new_all_visible_frozen_pages = 1;
> +		}
> +		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
> +				 prstate.set_all_frozen)
> +			presult->new_all_frozen_pages = 1;
> +	}
> +
>  	if (prstate.attempt_freeze)
>  	{
>  		if (presult->nfrozen > 0)

Feels like this is kinda redoing what heap_page_will_set_vm already did.


> @@ -472,7 +458,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
>  /* in heap/vacuumlazy.c */
>  extern void heap_vacuum_rel(Relation rel,
>  							const VacuumParams params, BufferAccessStrategy bstrategy);
> -
> +#ifdef USE_ASSERT_CHECKING
> +extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
> +									 GlobalVisState *vistest,
> +									 bool *all_frozen,
> +									 TransactionId *visibility_cutoff_xid,
> +									 OffsetNumber *logging_offnum);
> +#endif
>  /* in heap/heapam_visibility.c */
>  extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
>  										 Buffer buffer);

I'd not remove the newline before "/* in heap/heapam_visibility.c */". Other
"sections" do have that newline before the "/* in $filename */" comment too.


> From c47a6270a0a0045347cdb4597b957798d21db4aa Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Sat, 27 Sep 2025 11:55:21 -0400
> Subject: [PATCH v40 06/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

Same comment about Eliminate as in the prior commit.

Perhaps worth mentioning more explicitly that this doesn't really have an
advantage other than getting rid of the last user of XLOG_HEAP2_VISIBLE?



> @@ -1923,13 +1926,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
>  			PageSetAllVisible(page);
>  			PageClearPrunable(page);
> -			visibilitymap_set(vacrel->rel, blkno, buf,
> -							  InvalidXLogRecPtr,
> -							  vmbuffer, InvalidTransactionId,
> -							  VISIBILITYMAP_ALL_VISIBLE |
> -							  VISIBILITYMAP_ALL_FROZEN);
> +			visibilitymap_set_vmbits(blkno,
> +									 vmbuffer,
> +									 VISIBILITYMAP_ALL_VISIBLE |
> +									 VISIBILITYMAP_ALL_FROZEN,
> +									 vacrel->rel->rd_locator);
> +
> +			/*
> +			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
> +			 * setting the VM.
> +			 */
> +			if (RelationNeedsWAL(vacrel->rel))
> +				log_heap_prune_and_freeze(vacrel->rel, buf,
> +										  vmbuffer,
> +										  VISIBILITYMAP_ALL_VISIBLE |
> +										  VISIBILITYMAP_ALL_FROZEN,
> +										  InvalidTransactionId, /* conflict xid */
> +										  false,	/* cleanup lock */
> +										  PRUNE_VACUUM_SCAN,	/* reason */
> +										  NULL, 0,
> +										  NULL, 0,
> +										  NULL, 0,
> +										  NULL, 0);
> +
>  			END_CRIT_SECTION();

It's a tad odd that we do:

			/*
			 * It's possible that another backend has extended the heap,
			 * initialized the page, and then failed to WAL-log the page due
			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
			 * might try to replay our record setting the page all-visible and
			 * find that the page isn't initialized, which will cause a PANIC.
			 * To prevent that, check whether the page has been previously
			 * WAL-logged, and if not, do that now.
			 */
			if (RelationNeedsWAL(vacrel->rel) &&
				!XLogRecPtrIsValid(PageGetLSN(page)))
				log_newpage_buffer(buf, true);

if we then immediately afterwards emit a WAL record that could just as well
have included in FPI of the heap page.



> From 181c83f0652bfebe0db2f11983ad08b52c8c780b Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Sat, 27 Sep 2025 11:55:36 -0400
> Subject: [PATCH v40 07/12] Remove XLOG_HEAP2_VISIBLE entirely
> 
> There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
> can be removed. This includes deleting the xl_heap_visible struct and
> all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
> records.

> This changes the visibility map API, so any external users/consumers of
> the VM-only WAL record will need to change.

I hope there aren't any. Not sure I can really see scenarios in which that'd
be a safe thing to do from an external user...




> diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
> index ce3566ba949..5eed567a8e5 100644
> --- a/src/include/access/heapam_xlog.h
> +++ b/src/include/access/heapam_xlog.h
> @@ -60,7 +60,6 @@
>  #define XLOG_HEAP2_PRUNE_ON_ACCESS      0x10
>  #define XLOG_HEAP2_PRUNE_VACUUM_SCAN    0x20
>  #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
> -#define XLOG_HEAP2_VISIBLE      0x40
>  #define XLOG_HEAP2_MULTI_INSERT 0x50
>  #define XLOG_HEAP2_LOCK_UPDATED 0x60
>  #define XLOG_HEAP2_NEW_CID      0x70
> @@ -443,20 +442,6 @@ typedef struct xl_heap_inplace

I think other places with a gap in the "actions" mention that some value is
now unused.


> diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
> index 8a67bfa1aff..d9042e1f91d 100644
> --- a/src/backend/access/common/bufmask.c
> +++ b/src/backend/access/common/bufmask.c
> @@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
>  
>  	/*
>  	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
> -	 * we don't mark the page all-visible. See heap_xlog_visible() for
> -	 * details.
> +	 * we don't mark the page all-visible. See heap_xlog_prune_freeze() for
> +	 * more details.
>  	 */
>  	PageClearAllVisible(page);
>  }

Not introduced by your change, but isn't it rather terrifying that the
wal_consistency_checking infrastructure doesn't verify whether the page is
marked all-visible? Wasn't aware of this. Seems bonkers to me.

I don't even know what specifically in heap_xlog_visible() that comment is
referring to? Just that we only do PageSetAllVisible() if BLK_NEEDS_REDO? But
uh, what does that have to do with anything?


> diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
> index e21b96281a6..f1da52b2069 100644
> --- a/src/backend/access/heap/visibilitymap.c
> +++ b/src/backend/access/heap/visibilitymap.c
> @@ -14,8 +14,7 @@
>   *		visibilitymap_clear  - clear bits for one page in the visibility map
>   *		visibilitymap_pin	 - pin a map page for setting a bit
>   *		visibilitymap_pin_ok - check whether correct map page is already pinned
> - *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
> - *		visibilitymap_set_vmbits - set bit(s) in a pinned page
> + *		visibilitymap_set	 - set bit(s) in a previously pinned page
>   *		visibilitymap_get_status - get status of bits
>   *		visibilitymap_count  - count number of bits set in visibility map
>   *		visibilitymap_prepare_truncate -

There's a comment saying:

 * Clearing visibility map bits is not separately WAL-logged.  The callers
 * must make sure that whenever a bit is cleared, the bit is cleared on WAL
 * replay of the updating operation as well.

Which kinda implies that setting the VM *is* separately WAL logged. But that's
not true anymore.  Maybe rephrase that ever so slightly?


> @@ -477,12 +477,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
>  	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
>  	 *
>  	 * This can happen when replaying already-applied WAL records after a
> -	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
> -	 * record that marks as frozen a page which was already all-visible.  It's
> -	 * also quite common with records generated during index deletion
> -	 * (original execution of the deletion can reason that a recovery conflict
> -	 * which is sufficient for the deletion operation must take place before
> -	 * replay of the deletion record itself).
> +	 * standby crash or restart

Again not about your patch: I don't understand how already applied WAL can
lead to InvalidTransactionId being passed here. The record doesn't change just
because we had already applied the WAL?



> From 04b03c1ec3abcee75e464fef994b482df41b35f4 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 3 Dec 2025 15:07:24 -0500
> Subject: [PATCH v40 08/12] Track which relations are modified by a query
> 
> Save the relids of modified relations in a bitmap in the executor state.
> A later commit will pass this information down to scan nodes to control
> whether or not on-access pruning is allowed to set the visibility map.
> Setting the visibility map during a scan is counterproductive if the
> query is going to modify the page immediately after.
> 
> Relations are considered modified if they are the target of INSERT,
> UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
> FOR UPDATE/SHARE). All row mark types are included, even those which
> don't actually modify tuples, because this bitmap is only used as a hint
> to avoid unnecessary work.

You're probably going to hate me for the question, but is there a reason to
not compute es_modified_relids at plan time?


> @@ -992,6 +996,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
>  	 */
>  	planstate = ExecInitNode(plan, estate, eflags);
>  
> +#ifdef USE_ASSERT_CHECKING
> +	CrossCheckModifiedRelids(estate);
> +#endif

Not sure that buys you much, given it pretty much is just a restatement of the
code building estate->es_modified_relids.

What about checking against PlannedStmt->{resultRelations, permInfos} or
asserting membership at the places that actually lock/modify?


> @@ -3048,6 +3056,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
>  	rcestate->es_output_cid = parentestate->es_output_cid;
>  	rcestate->es_queryEnv = parentestate->es_queryEnv;
>  
> +	/*
> +	 * Use a deep copy to avoid stale pointers since bms_add_member() may
> +	 * reallocate the bitmap.
> +	 */
> +	rcestate->es_modified_relids = bms_copy(parentestate->es_modified_relids);
> +
>  	/*
>  	 * ResultRelInfos needed by subplans are initialized from scratch when the
>  	 * subplans themselves are initialized.

Hm. Why copy at all from the parent? Afaict we'll just redo the computation of
es_modified_relids from scratch anyway?  Not sure about it though.



> From 05d736fb5b0600effede5e030d5b929274dabe2c Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Mon, 2 Mar 2026 16:31:17 -0500
> Subject: [PATCH v40 09/12] Thread flags through begin-scan APIs
> 
> Add a flags parameter to the index_fetch_begin() table AM callback and
> the begin-scan helpers so the executor can pass context for building
> scan descriptors. This introduces an extension point for follow-up work
> to mark relations as read-only for the current query, without changing
> behavior in this patch.



> diff --git a/src/include/access/genam.h b/src/include/access/genam.h
> index 1a27bf060b3..db102803eb5 100644
> --- a/src/include/access/genam.h
> +++ b/src/include/access/genam.h
> @@ -158,7 +158,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
>  									 Relation indexRelation,
>  									 Snapshot snapshot,
>  									 IndexScanInstrumentation *instrument,
> -									 int nkeys, int norderbys);
> +									 int nkeys, int norderbys, uint32 flags);

I'd probably put flags in a position where it's not as easily confused with
nkeys or norderbys.


> diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
> index 4ce63990326..3820bbd7f9f 100644
> --- a/src/include/access/heapam.h
> +++ b/src/include/access/heapam.h
> @@ -96,8 +96,9 @@ typedef struct HeapScanDescData
>  	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
>  
>  	/*
> -	 * For sequential scans and bitmap heap scans. The current heap block's
> -	 * corresponding page in the visibility map.
> +	 * For sequential scans, bitmap heap scans, TID range scans, and sample
> +	 * scans. The current heap block's corresponding page in the visibility
> +	 * map.
>  	 */
>  	Buffer		rs_vmbuffer;

As you already can see here, exhaustively listing scan types is unlikely to be
maintained over time...


> @@ -1059,10 +1062,11 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
>  static inline TableScanDesc
>  table_beginscan_tidrange(Relation rel, Snapshot snapshot,
>  						 ItemPointer mintid,
> -						 ItemPointer maxtid)
> +						 ItemPointer maxtid, uint32 flags)
>  {
>  	TableScanDesc sscan;
> -	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
> +
> +	flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
>  
>  	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);

Hm. Would it perhaps be a good idea to have an assert as to which flags are
specified by the "user"? If e.g. another SO_TYPE_* were specified it might
result in some odd behaviour.

Perhaps this would be best done by adding an argument to
table_beginscan_common() specifying the "internal" flags (i.e. the ones that
specified inside table_beginscan_*) and user specified flags?  Then
table_beginscan_common could check the set of user specified flags being sane.



> From 7790c8177ba3aa8a8bd1a216ea77fdfd42efc1bf Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Mon, 2 Mar 2026 16:31:33 -0500
> Subject: [PATCH v40 10/12] Pass down information on table modification to scan
>  node

> diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
> index 3820bbd7f9f..1a7306e2935 100644
> --- a/src/include/access/heapam.h
> +++ b/src/include/access/heapam.h
> @@ -132,6 +132,12 @@ typedef struct IndexFetchHeapData
>  
>  	/* Current heap block's corresponding page in the visibility map */
>  	Buffer		xs_vmbuffer;
> +
> +	/*
> +	 * Some optimizations can only be performed if the query does not modify
> +	 * the underlying relation. Track that here.
> +	 */
> +	bool		modifies_base_rel;
>  } IndexFetchHeapData;
>  

The other members are prefixed with xs_, I don't see a reason to diverge for
this one.

Wonder if this should be in the generic IndexFetchTableData?


> From 0a16dad7a4ebe224f35629a39619d0feb03f03a3 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Fri, 27 Feb 2026 16:33:40 -0500
> Subject: [PATCH v40 11/12] Allow on-access pruning to set pages all-visible
> 
> Many queries do not modify the underlying relation. For such queries, if
> on-access pruning occurs during the scan, we can check whether the page
> has become all-visible and update the visibility map accordingly.
> Previously, only vacuum and COPY FREEZE marked pages as all-visible or
> all-frozen.

> This commit implements on-access VM setting for sequential scans as well
> as for the underlying heap relation in index scans and bitmap heap
> scans.

I'd mention that this often can:
- avoid write amplification, due to vacuum later having to PageSetAllVisible()
  (often triggering another data write and another FPI)
- allow index only scans much earlier than before

I think those are pretty huge benefits, so they should be mentioned.


> diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
> index d264a698ff6..a5536ba4ff6 100644
> --- a/src/test/recovery/t/035_standby_logical_decoding.pl
> +++ b/src/test/recovery/t/035_standby_logical_decoding.pl
> @@ -296,6 +296,7 @@ wal_level = 'logical'
>  max_replication_slots = 4
>  max_wal_senders = 4
>  autovacuum = off
> +hot_standby_feedback = on
>  });
>  $node_primary->dump_info;
>  $node_primary->start;
> @@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
>  $logstart = -s $node_standby->logfile;
>  
>  reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
> -	'no_conflict_', 0, 1);
> +	'no_conflict_', 1, 0);
>  
>  # This should not trigger a conflict
>  wait_until_vacuum_can_remove(
> -- 
> 2.43.0

Why does this patch need to change anything here? Is the test buggy
independently?



> From e4c7112d49e650f59dab834d3db6007c69f34f1a Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Tue, 29 Jul 2025 16:12:56 -0400
> Subject: [PATCH v40 12/12] Set pd_prune_xid on insert
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Now that visibility map (VM) updates can occur during read-only queries,
> it makes sense to also set the page’s pd_prune_xid hint during inserts
> and on the new page during updates.
> 
> This enables heap_page_prune_and_freeze() to run and set the VM
> all-visible after a page is filled with newly inserted tuples the first
> time it is read.
> 
> This change also addresses a long-standing note in heap_insert() and
> heap_multi_insert(), which observed that setting pd_prune_xid would
> help clean up aborted insertions sooner. Without it, such tuples might
> linger until VACUUM, whereas now they can be pruned earlier.

I think this commit message should also mention more what the benefits of
doing this are (i.e. a good potential for reduced write amplicifation and
increased IOS potential).


> The index killtuples test had to be updated to reflect a larger number
> of hits by some accesses. Since the prune_xid is set by the fill/insert
> step, on-access pruning can happen during the first access step (before
> the DELETE). This is when the VM is extended. After the DELETE, the next
> access hits the VM block instead of extending it. Thus, an additional
> buffer hit is counted for the table.

I think that may since already have been solved by f5eb854ab6d.

> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
>  	TransactionId xid = GetCurrentTransactionId();
>  	HeapTuple	heaptup;
>  	Buffer		buffer;
> +	Page		page;
>  	Buffer		vmbuffer = InvalidBuffer;
>  	bool		all_visible_cleared = false;
>  
> @@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
>  									   &vmbuffer, NULL,
>  									   0);
>  
> +	page = BufferGetPage(buffer);
> +
>  	/*
>  	 * We're about to do the actual insert -- but check for conflict first, to
>  	 * avoid possibly having to roll back work we've just done.
> @@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
>  	RelationPutHeapTuple(relation, buffer, heaptup,
>  						 (options & HEAP_INSERT_SPECULATIVE) != 0);
>  
> -	if (PageIsAllVisible(BufferGetPage(buffer)))
> +	if (PageIsAllVisible(page))
>  	{
>  		all_visible_cleared = true;
> -		PageClearAllVisible(BufferGetPage(buffer));
> +		PageClearAllVisible(page);
>  		visibilitymap_clear(relation,
>  							ItemPointerGetBlockNumber(&(heaptup->t_self)),
>  							vmbuffer, VISIBILITYMAP_VALID_BITS);
>  	}

The repeated BufferGetPage()s have been bothering me, good :)


>  	/*
> -	 * XXX Should we set PageSetPrunable on this page ?
> +	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
> +	 * is full so that we can set the page all-visible in the VM on the next
> +	 * page access.
>  	 *
> -	 * The inserting transaction may eventually abort thus making this tuple
> -	 * DEAD and hence available for pruning. Though we don't want to optimize
> -	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
> -	 * aborted tuple will never be pruned until next vacuum is triggered.
> +	 * Setting pd_prune_xid is also handy if the inserting transaction
> +	 * eventually aborts making this tuple DEAD and hence available for
> +	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
> +	 * tuple would never otherwise be pruned until next vacuum is triggered.
>  	 *
> -	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
> +	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
> +	 * tuple.
>  	 */
> +	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
> +		PageSetPrunable(page, xid);
>  
>  	MarkBufferDirty(buffer);
>  

Perhaps add "as neither of those can be pruned anyway." or such to the last
sentence?



> @@ -1863,16 +1864,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
>  			prstate->set_all_visible = false;
>  			prstate->set_all_frozen = false;
>  
> -			/* The page should not be marked all-visible */
> -			if (PageIsAllVisible(page))
> -				heap_fix_vm_corruption(prstate, offnum);
> -

Huh?


Getting close, I think.


Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-20 02:38  Melanie Plageman <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-03-20 02:38 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Thanks for the detailed review! Unless otherwise specified, attached
v41 includes all of your straightforward review points.

On Wed, Mar 18, 2026 at 1:14 PM Andres Freund <[email protected]> wrote:
>
> > +                     params.relation = relation;
> > +                     params.buffer = buffer;
> > +                     params.vmbuffer = *vmbuffer;
> > +                     params.reason = PRUNE_ON_ACCESS;
> > +                     params.vistest = vistest;
> > +                     params.cutoffs = NULL;
> >
>
> >                        * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
> > @@ -284,14 +312,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
> >                        * cannot safely determine that during on-access pruning with the
> >                        * current implementation.
> >                        */
> > -                     PruneFreezeParams params = {
> > -                             .relation = relation,
> > -                             .buffer = buffer,
> > -                             .reason = PRUNE_ON_ACCESS,
> > -                             .options = 0,
> > -                             .vistest = vistest,
> > -                             .cutoffs = NULL,
> > -                     };
> > +                     params.options = 0;
> >
> >                       heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
> >                                                                          NULL, NULL);
>
> Why does this change the way the PruneFreezeParams variable is defined?  I
> don't really mind, it's just a bit confusing.

I couldn't use the designated initializer after visibilitymap_pin()
and I thought it was worse to have the designated initializer
nitialize vmbuffer to InvalidBuffer and then have to set vmbuffer to
the real vmbuffer  after visibilitymap_pin().

> > + * Helper to fix visibility-related corruption on a heap page and its
> > + * corresponding VM page. An all-visible page cannot have dead items nor can
> > + * it have tuples that are not visible to all running transactions. It clears
> > + * the VM corruption as well as resetting the vmbits used during pruning.
>
> So this is now only called when we already know there's corruption?  I think
> that could be clearer in the comments.
>
> Seems a bit odd that the function then figures out what it should do from the
> page & VM contents, given that the caller already needs to have known that
> something is wrong?

Yea, it was all a bit off. I agree. I've tried something new and made
a VMCorruptionType enum for the caller to pass in which tells this
function what to do (clear PD_ALL_VISIBLE and/or clear VM) and what
warning to emit.

> > +static void
> > +heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
> > +{
> > +             {
> > +                     ereport(WARNING,
> > +                                     (errcode(ERRCODE_DATA_CORRUPTED),
> > +                                      errmsg("tuple not visible to all transactions found on page marked all-visible"),
> > +                                      errcontext("relation \"%s\", page %u, tuple %u",
> > +                                                             relname, prstate->block, offnum)));
> > +             }
>
> Wait, why are we now WARNING about the PageIsAllVisible() &&
> prstate->lpdead_items == 0 case? Seems to run flatly counter to the comment
> above about GetOldestNonRemovableTransactionId() going backward?

Only if the page has tuples with HTSV_Result
HEAPTUPLE_RECENTLY_DEAD/DELETE_IN_PROGRESS/INSERT_IN_PROGRESS. Even if
GetOldestNonRemovableTransactionId() goes backwards that should only
make it so that xids we previously thought were visible now show as
not visible to all. But those have to be HEAPTUPLE_LIVE tuples. We
should never thought it was all-visible if there were in-progress
deletes/inserts. So, I think it is okay. Now (in v41), the caller
would need to pass VM_CORRUPT_TUPLE_VISIBILITY and intend to emit the
warning.

> > +     else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
> > +     {
> > +             /*
> > +              * As of PostgreSQL 9.2, the visibility map bit should never be set if
> > +              * the page-level bit is clear. However, for vacuum, it's possible
> > +              * that the bit got cleared after heap_vac_scan_next_block() was
> > +              * called, so we must recheck now that we have the buffer lock before
> > +              * concluding that the VM is corrupt.
> > +              */
> > +             ereport(WARNING,
> > +                             (errcode(ERRCODE_DATA_CORRUPTED),
> > +                              errmsg("page is not marked all-visible but visibility map bit is set"),
> > +                              errcontext("relation \"%s\", page %u",
> > +                                                     relname, prstate->block)));
> > +     }
> > +
> > +     visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
> > +                                             VISIBILITYMAP_VALID_BITS);
> > +     prstate->vmbits = 0;
>
> So we can end up clearing the VM without emitting any warning?

This was me trying to avoid duplicating code in the branches. In v41,
I error out if the caller doesn't specify a valid corruption type, so
anything that clears the VM will have emitted a warning.

> > +static void
> > +heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
> > +{
> > +     OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
> > +     Page            page = prstate->page;
> > +
> > +     Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
> > +                (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
> > +                     !prstate->attempt_freeze));
> > +
> > +     /* We'll fill in presult for the caller */
> > +     memset(presult, 0, sizeof(PruneFreezeResult));
> > +
> > +     presult->vmbits = prstate->vmbits;
> > +
> > +     /* Clear any stale prune hint */
> > +     if (TransactionIdIsValid(PageGetPruneXid(page)))
> > +     {
> > +             PageClearPrunable(page);
> > +             MarkBufferDirtyHint(prstate->buffer, true);
> > +     }
> > +
> > +     if (PageIsEmpty(page))
> > +             return;
> > +
> > +     presult->hastup = true;
>
> Is that actually a given? Couldn't the page consist solely out of unused
> items? That'd make PageIsEmpty() return false, but should still allow
> truncation.

Good point. I've changed it to set hastup when counting live tuples.

But should I set hastup it if I see an LP_REDIRECT pointer? I know I
should always see a LP_NORMAL pointer if I see an LP_REDIRECT pointer,
but I just wondered if I should explicitly set hastup when I see
LP_REDIRECT since heap_prune_record_redirect() sets hastup = true.

> > diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
>
> > +     /*
> > +      * After processing all the live tuples on the page, if the newest xmin
> > +      * amongst them may be considered running by any snapshot, the page cannot
> > +      * be all-visible.
> > +      */
> > +     if (prstate.set_all_visible &&
> > +             TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
> > +             GlobalVisTestXidMaybeRunning(prstate.vistest,
> > +                                                                      prstate.visibility_cutoff_xid))
> > +             prstate.set_all_visible = prstate.set_all_frozen = false;
> > +
>
> So the docs for prstate.visibility_cutoff_xid say:
>
>          * visibility_cutoff_xid is the newest xmin of live tuples on the page.
>          * The caller can use it as the conflict horizon, when setting the VM
>          * bits.  It is only valid if we froze some tuples, and set_all_frozen is
>          * true.
>
> But here we look at it without checking that we froze some tuples.  I guess
> the comment is outdated?

That comment was never correct -- or I have chopped it into
unrecognizable bits over the last two years.

> Could the "going backward" thing possibly trigger a spurious assert in
>
>         Assert(heap_page_is_all_visible(vacrel->rel, buf,
>                                         vacrel->vistest, &debug_all_frozen,
>                                         &debug_cutoff, &vacrel->offnum));

I don't think anything (today) updates GlobalVisState between
GlobalVisTestXidMaybeRunning() and the heap_page_is_all_visible()
assert.

I had removed the visibility_cutoff_xid part of the assertion on the
intuition that comparing an exact horizon would no longer work when
using GlobalVisState. I can't remember if I actually saw failing
tests, but I don't see them anymore (so I've put it back).

The heap_page_is_all_visible() assertion moves into
heap_page_prune_and_freeze() in a later patch in this set, and while
it is also in a place where I don't think GlobalVisState can have
moved between making the page changes and calling
heap_page_is_all_visible(), I suspect it won't be a totally reliable
assertion now that it uses a moving target for comparison. What do you
think?

> > From a1d768a8cea8ac13e250188ec96c01d98acda94a Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <[email protected]>
> > Date: Sat, 28 Feb 2026 16:06:51 -0500
> > Subject: [PATCH v40 04/12] Keep newest live XID up-to-date even if page not
> >  all-visible
>
> I guess I'd have expected 03 and 04 to be swapped... But whatever.

It couldn't be because I used GlobalVisState to always keep it
up-to-date (even for on-access pruning).

> > @@ -1076,6 +1116,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
> >               prstate.set_all_visible = prstate.set_all_frozen = false;
> >
> >       Assert(!prstate.set_all_frozen || prstate.set_all_visible);
> > +     Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
>
> Why didn't we have this assert earlier?

It was in lazy_scan_prune() as:
    Assert(!presult.set_all_visible || !(*has_lpdead_items));

> > +     do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
>
> Most of the other heap_page_prune_and_freeze() helpers are named
> heap_prune_xyz(), why not follow that here?
>
> I guess this holds for a few other helpers added in earlier commits
> too. E.g. heap_page_bypass_prune_freeze() should probably be
> heap_prune_bypass_prune_freeze() or such.

Most of the helpers prefixed with "heap_prune" now directly do
something related to pruning like recording line pointers and
traversing hot chains. heap_page_will_set_vm() and
heap_page_will_freeze() have nothing to do with pruning, so I think it
makes sense they are named differently.

And I don't think we are in any danger of folks using functions not
prefixed with heap_prune for other purposes, given that most of them
take a PruneState as an argument.

I'll do a big rename if you feel strongly about it, though.

> > @@ -1097,14 +1161,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
> >
> >               /*
> >                * If that's all we had to do to the page, this is a non-WAL-logged
> > -              * hint.  If we are going to freeze or prune the page, we will mark
> > -              * the buffer dirty below.
> > +              * hint.  If we are going to freeze or prune the page or set
> > +              * PD_ALL_VISIBLE, we will mark the buffer dirty below.
> > +              *
> > +              * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
> > +              * for the VM to be set and PD_ALL_VISIBLE to be clear.
> >                */
> > -             if (!do_freeze && !do_prune)
> > +             if (!do_freeze && !do_prune && !do_set_vm)
> >                       MarkBufferDirtyHint(prstate.buffer, true);
> >       }
>
> This block is gated by if (do_hint_prune) which is computed as:
>
>         /*
>          * Even if we don't prune anything, if we found a new value for the
>          * pd_prune_xid field or the page was marked full, we will update the hint
>          * bit.
>          */
>         do_hint_prune = PageGetPruneXid(prstate.page) != prstate.new_prune_xid ||
>                 PageIsFull(prstate.page);
>
> It's not really related to this change, but I'm just confused a bit by the
> "|| PageIsFull(prstate.page)". What is that about? Why do we want to mark the
> buffer DirtyHint if the page is full? It very well might already have been
> marked as such, no?

Because if the page is marked full, we clear that hint, and, if that's
the only change we make to the page, we need to do
MarkBufferDirtyHint().

> > +#ifdef USE_ASSERT_CHECKING
> > +     if (prstate.set_all_visible)
> > +     {
> > +             TransactionId debug_cutoff;
> > +             bool            debug_all_frozen;
> > +
> > +             Assert(prstate.lpdead_items == 0);
> > +
> > +             Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
> > +                                                                             prstate.vistest,
> > +                                                                             &debug_all_frozen,
> > +                                                                             &debug_cutoff, off_loc));
> > +
> > +             /*
> > +              * It's possible the page is composed entirely of frozen tuples but is
> > +              * not set all-frozen in the VM and did not pass
> > +              * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
> > +              * heap_page_is_all_visible() finds the page completely frozen, even
> > +              * though prstate.set_all_frozen is false.
> > +              */
> > +             Assert(!prstate.set_all_frozen || debug_all_frozen);
>
> Seems like we could verify that debug_cutoff isn't newer than conflict_xid?

Well, not conflict_xid, but newest_xid, yes.

> Hm.  I guess aborting after we did incorrect pruning/freezing/VMing is better
> than not, but it'd be even better if we did it before corrupting things. But I
> guess it'd be not trivial to add something like the debug_cutoff assertion I
> suggest above, when freezing of tuples is only executed after
> heap_page_is_all_visible() (for dead tuples heap_page_would_be_all_visible()
> already has provisions).
>
> It's probably more a theoretical concern than a real worry.

Yea, I think the work it would take to make
heap_page_would_be_all_visible() work for frozen tuples wouldn't be
worth it just to get it to assert out before executing the page
changes.

> > +     presult->new_all_visible_pages = 0;
> > +     presult->new_all_frozen_pages = 0;
> > +     presult->new_all_visible_frozen_pages = 0;
>
> Isn't it odd to talk about pages here? Given that heap_page_prune_and_freeze()
> only ever operates on exactly one page.  Is that just so you can do
>
> > +     vacrel->new_all_visible_pages += presult.new_all_visible_pages;

I made this change because you didn't like it when I passed old_vmbits
and new_vmbits back out to lazy_scan_prune() to derive these counters.
FWIW I think it's better not to have lazy_scan_prune() compare new and
old vmbits to increment counters, because lazy_scan_prune() shouldn't
have to know about the VM anymore once it is not setting it.

> > +     if (do_set_vm)
> > +     {
> > +             if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
> > +             {
> > +                     presult->new_all_visible_pages = 1;
> > +                     if (prstate.set_all_frozen)
> > +                             presult->new_all_visible_frozen_pages = 1;
> > +             }
> > +             else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
> > +                              prstate.set_all_frozen)
> > +                     presult->new_all_frozen_pages = 1;
> > +     }
> > +
> >       if (prstate.attempt_freeze)
> >       {
> >               if (presult->nfrozen > 0)
>
> Feels like this is kinda redoing what heap_page_will_set_vm already did.

The logic is different than what is in heap_page_will_set_vm() because
there we don't care about what old_vmbits is. We are simply concerned
with whether we should set new_vmbits to something.

So we need to have logic somewhere that is figuring out if the vmbits
were set before and whether we newly set them. That can either go in
heap_page_prune_and_freeze() and we can use that to set the counters
in the LVRelState or it can go in lazy_scan_prune().

I think it makes more sense in heap_page_prune_and_freeze() so that
lazy_scan_prune() doesn't have to know about the VM's new/old state,
which it otherwise no longer deals with.


> > @@ -1923,13 +1926,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
> >                       PageSetAllVisible(page);
> >                       PageClearPrunable(page);
> > -                     visibilitymap_set(vacrel->rel, blkno, buf,
> > -                                                       InvalidXLogRecPtr,
> > -                                                       vmbuffer, InvalidTransactionId,
> > -                                                       VISIBILITYMAP_ALL_VISIBLE |
> > -                                                       VISIBILITYMAP_ALL_FROZEN);
> > +                     visibilitymap_set_vmbits(blkno,
> > +                                                                      vmbuffer,
> > +                                                                      VISIBILITYMAP_ALL_VISIBLE |
> > +                                                                      VISIBILITYMAP_ALL_FROZEN,
> > +                                                                      vacrel->rel->rd_locator);
> > +
> > +                     /*
> > +                      * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
> > +                      * setting the VM.
> > +                      */
> > +                     if (RelationNeedsWAL(vacrel->rel))
> > +                             log_heap_prune_and_freeze(vacrel->rel, buf,
> > +                                                                               vmbuffer,
> > +                                                                               VISIBILITYMAP_ALL_VISIBLE |
> > +                                                                               VISIBILITYMAP_ALL_FROZEN,
> > +                                                                               InvalidTransactionId, /* conflict xid */
> > +                                                                               false,        /* cleanup lock */
> > +                                                                               PRUNE_VACUUM_SCAN,    /* reason */
> > +                                                                               NULL, 0,
> > +                                                                               NULL, 0,
> > +                                                                               NULL, 0,
> > +                                                                               NULL, 0);
> > +
> >                       END_CRIT_SECTION();
>
> It's a tad odd that we do:
>
>                         /*
>                          * It's possible that another backend has extended the heap,
>                          * initialized the page, and then failed to WAL-log the page due
>                          * to an ERROR.  Since heap extension is not WAL-logged, recovery
>                          * might try to replay our record setting the page all-visible and
>                          * find that the page isn't initialized, which will cause a PANIC.
>                          * To prevent that, check whether the page has been previously
>                          * WAL-logged, and if not, do that now.
>                          */
>                         if (RelationNeedsWAL(vacrel->rel) &&
>                                 !XLogRecPtrIsValid(PageGetLSN(page)))
>                                 log_newpage_buffer(buf, true);
>
> if we then immediately afterwards emit a WAL record that could just as well
> have included in FPI of the heap page.

I originally added a flag to log_heap_prune_and_freeze() that could
force an FPI but Robert disliked it, saying he found it more
confusing. He said:

> 0004. It is not clear to me why you need to get
> log_heap_prune_and_freeze to do the work here. Why can't
> log_newpage_buffer get the job done already?

I can put it back that way, I don't have strong feelings either way.
Though I imagine if I add another argument to
log_heap_prune_and_freeze(), you'll bring up creating a struct for its
arguments again...


> > diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
> > index 8a67bfa1aff..d9042e1f91d 100644
> > --- a/src/backend/access/common/bufmask.c
> > +++ b/src/backend/access/common/bufmask.c
> > @@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
> >
> >       /*
> >        * During replay, if the page LSN has advanced past our XLOG record's LSN,
> > -      * we don't mark the page all-visible. See heap_xlog_visible() for
> > -      * details.
> > +      * we don't mark the page all-visible. See heap_xlog_prune_freeze() for
> > +      * more details.
> >        */
> >       PageClearAllVisible(page);
> >  }
>
> Not introduced by your change, but isn't it rather terrifying that the
> wal_consistency_checking infrastructure doesn't verify whether the page is
> marked all-visible? Wasn't aware of this. Seems bonkers to me.

Agreed. I wonder what it would take to start.

> I don't even know what specifically in heap_xlog_visible() that comment is
> referring to? Just that we only do PageSetAllVisible() if BLK_NEEDS_REDO? But
> uh, what does that have to do with anything?

Yea, this comment doesn't make sense. I think we should remove it.

But regarding why we mask PD_ALL_VISIBLE in wal consistency checking,
I wonder if this is the scenario:

Record 1 sets the VM and PD_ALL_VISIBLE
Record 2 inserts a tuple and clears PD_ALL_VISIBLE
the heap page is flushed to disk, but the VM page is not
crash
replay R1; skip setting PD_ALL_VISIBLE because the page has R2's LSN;
set the bits on the VM page

Even though we'll clear the VM when we replay R2, if we cross-check
the page and VM after replaying only R1, the VM will be set and
PD_ALL_VISIBLE will be clear. I think this is okay because no one
should see them at this time. But it might not work with wal
consistency checking.

> > @@ -477,12 +477,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
> >        * If we get passed InvalidTransactionId then we do nothing (no conflict).
> >        *
> >        * This can happen when replaying already-applied WAL records after a
> > -      * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
> > -      * record that marks as frozen a page which was already all-visible.  It's
> > -      * also quite common with records generated during index deletion
> > -      * (original execution of the deletion can reason that a recovery conflict
> > -      * which is sufficient for the deletion operation must take place before
> > -      * replay of the deletion record itself).
> > +      * standby crash or restart
>
> Again not about your patch: I don't understand how already applied WAL can
> lead to InvalidTransactionId being passed here. The record doesn't change just
> because we had already applied the WAL?

Yea, I think the comment is just wrong. I realized the comment still
needed to reference my code, so I've updated it.

> > From 04b03c1ec3abcee75e464fef994b482df41b35f4 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <[email protected]>
> > Date: Wed, 3 Dec 2025 15:07:24 -0500
> > Subject: [PATCH v40 08/12] Track which relations are modified by a query
> >
> > Save the relids of modified relations in a bitmap in the executor state.
> > A later commit will pass this information down to scan nodes to control
> > whether or not on-access pruning is allowed to set the visibility map.
> > Setting the visibility map during a scan is counterproductive if the
> > query is going to modify the page immediately after.
> >
> > Relations are considered modified if they are the target of INSERT,
> > UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
> > FOR UPDATE/SHARE). All row mark types are included, even those which
> > don't actually modify tuples, because this bitmap is only used as a hint
> > to avoid unnecessary work.
>
> You're probably going to hate me for the question, but is there a reason to
> not compute es_modified_relids at plan time?

Yea, it probably does make more sense there. The only thing is that by
doing it in planner, it could include relids of leaf partitions that
get run-time pruned. But we won't scan those, so it is no issue for
this feature. I'm just wondering if it dilutes the meaning of
"modified relids", though.

In v41, I've implemented it in planner (which also made me realize
parallel workers previously didn't have es_modified_relids, oops).

> > @@ -992,6 +996,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
> >        */
> >       planstate = ExecInitNode(plan, estate, eflags);
> >
> > +#ifdef USE_ASSERT_CHECKING
> > +     CrossCheckModifiedRelids(estate);
> > +#endif
>
> Not sure that buys you much, given it pretty much is just a restatement of the
> code building estate->es_modified_relids.

Yea, now that I've done it in planner, I cross-check in the executor.

> What about checking against PlannedStmt->{resultRelations, permInfos} or

I don't think it makes sense to use permInfos because according to
expand_single_inheritance_child() there is no permission checking for
child RTEs, so I think permInfos won't include everything we need.

> asserting membership at the places that actually lock/modify?

Are you thinking I should also add some in ExecInsert, ExecDelete,
ExecUpdate, and ExecLockRows? Think this might be redundant with the
executor cross-check I have now after InitPlan(). (I've done it anyway
so we can discuss).

> > diff --git a/src/include/access/genam.h b/src/include/access/genam.h
> > index 1a27bf060b3..db102803eb5 100644
> > --- a/src/include/access/genam.h
> > +++ b/src/include/access/genam.h
> > @@ -158,7 +158,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
> >                                                                        Relation indexRelation,
> >                                                                        Snapshot snapshot,
> >                                                                        IndexScanInstrumentation *instrument,
> > -                                                                      int nkeys, int norderbys);
> > +                                                                      int nkeys, int norderbys, uint32 flags);
>
> I'd probably put flags in a position where it's not as easily confused with
> nkeys or norderbys.

Do you mean like move it before nkeys and norderbys or move it
earlier? I did the latter but not sure if it's weird to have flags
before snapshot (especially since the other table am routines pass it
last). I think it looks kind of weird when all of the other ones have
flags as the last argument.

> > @@ -1059,10 +1062,11 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
> >  static inline TableScanDesc
> >  table_beginscan_tidrange(Relation rel, Snapshot snapshot,
> >                                                ItemPointer mintid,
> > -                                              ItemPointer maxtid)
> > +                                              ItemPointer maxtid, uint32 flags)
> >  {
> >       TableScanDesc sscan;
> > -     uint32          flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
> > +
> > +     flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
> >
> >       sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
>
> Hm. Would it perhaps be a good idea to have an assert as to which flags are
> specified by the "user"? If e.g. another SO_TYPE_* were specified it might
> result in some odd behaviour.
>
> Perhaps this would be best done by adding an argument to
> table_beginscan_common() specifying the "internal" flags (i.e. the ones that
> specified inside table_beginscan_*) and user specified flags?  Then
> table_beginscan_common could check the set of user specified flags being sane.

Yes, good idea. Done in attached v41.

It's unclear to me which flags should be considered internal though. I
think it makes sense that the SO_TYPE* flags are considered internal
because you can only specify one.  But all of the other current
ScanOptions are specified inside table_beginscan_* so do you mean that
we should consider all of those internal flags?


> > +      * Some optimizations can only be performed if the query does not modify
> > +      * the underlying relation. Track that here.
> > +      */
> > +     bool            modifies_base_rel;
> >  } IndexFetchHeapData;
>
> Wonder if this should be in the generic IndexFetchTableData?

I added flags to the IndexFetchTableData in much the same way as the
regular table scan descriptor has them.

> > diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
> > index d264a698ff6..a5536ba4ff6 100644
> > --- a/src/test/recovery/t/035_standby_logical_decoding.pl
> > +++ b/src/test/recovery/t/035_standby_logical_decoding.pl
> > @@ -296,6 +296,7 @@ wal_level = 'logical'
> >  max_replication_slots = 4
> >  max_wal_senders = 4
> >  autovacuum = off
> > +hot_standby_feedback = on
> >  });
> >  $node_primary->dump_info;
> >  $node_primary->start;
> > @@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
> >  $logstart = -s $node_standby->logfile;
> >
> >  reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
> > -     'no_conflict_', 0, 1);
> > +     'no_conflict_', 1, 0);
> >
> >  # This should not trigger a conflict
> >  wait_until_vacuum_can_remove(
> > --
> > 2.43.0
>
> Why does this patch need to change anything here? Is the test buggy
> independently?

Nope. I guess that was a mistake during development. No change needed.

> > @@ -1863,16 +1864,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
> >                       prstate->set_all_visible = false;
> >                       prstate->set_all_frozen = false;
> >
> > -                     /* The page should not be marked all-visible */
> > -                     if (PageIsAllVisible(page))
> > -                             heap_fix_vm_corruption(prstate, offnum);
> > -
>
> Huh?

heap_prune_record_prunable() already does the corruption check, so I
don't need to do it separately for INSERT_IN_PROGRESS tuples once we
call heap_prune_record_prunable() for them.

- Melanie


Attachments:

  [text/x-patch] v41-0001-Fix-visibility-map-corruption-in-more-cases.patch (20.5K, 2-v41-0001-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From c87dd03ec309e12247c8ccdc3adf289a1e451255 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v41 01/12] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 215 +++++++++++++++++++++++++--
 src/backend/access/heap/vacuumlazy.c |  89 +----------
 src/include/access/heapam.h          |  12 ++
 src/tools/pgindent/typedefs.list     |   1 +
 4 files changed, 215 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8d9f0694206..e452d25cae6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -114,6 +114,21 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		old_vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -162,12 +177,30 @@ typedef struct
 	TransactionId visibility_cutoff_xid;
 } PruneState;
 
+
+/*
+ * Type of visibility map corruption detected on a heap page.  Passed to
+ * heap_page_fix_vm_corruption() so the caller can specify what it found rather
+ * than having the function re-derive the corruption from page state.
+ */
+typedef enum VMCorruptionType
+{
+	/* VM bits are set but the page-level PD_ALL_VISIBLE flag is not */
+	VM_CORRUPT_MISSING_PAGE_HINT,
+	/* LP_DEAD line pointers found on a page marked all-visible */
+	VM_CORRUPT_LPDEAD,
+	/* Tuple not visible to all transactions on a page marked all-visible */
+	VM_CORRUPT_TUPLE_VISIBILITY,
+} VMCorruptionType;
+
 /* Local functions */
 static void prune_freeze_setup(PruneFreezeParams *params,
 							   TransactionId *new_relfrozen_xid,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
+										VMCorruptionType ctype);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -175,7 +208,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -209,8 +243,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  *
  * This function may pin *vmbuffer. It's passed by reference so the caller can
- * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
- * responsible for unpinning it.
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
+ * VM corruption during pruning, we will fix it. Caller is responsible for
+ * unpinning *vmbuffer.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -277,6 +312,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -284,14 +329,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -354,6 +392,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -770,6 +814,104 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Emit a warning about and fix visibility map corruption on the given page.
+ *
+ * The caller specifies the type of corruption it has already detected via
+ * corruption_type, so that we can emit the appropriate warning. All cases
+ * result in the VM bits being cleared; page-level corruption types also clear
+ * PD_ALL_VISIBLE.
+ *
+ * Must be called while holding an exclusive lock on the heap buffer. Dead
+ * items must have been discovered under that same lock. Although we do not
+ * hold a lock on the VM buffer, it is pinned, and the heap buffer is
+ * exclusively locked, ensuring that no other backend can update the VM bits
+ * corresponding to this heap page.
+ *
+ * This function makes changes to the VM and, potentially, the heap page, but
+ * it does not need to be done in a critical section.
+ */
+static void
+heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
+							VMCorruptionType corruption_type)
+{
+	const char *relname = RelationGetRelationName(prstate->relation);
+
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	switch (corruption_type)
+	{
+		case VM_CORRUPT_LPDEAD:
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("dead line pointer found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+			break;
+
+		case VM_CORRUPT_TUPLE_VISIBILITY:
+
+			/*
+			 * A HEAPTUPLE_LIVE tuple on an all-visible page can appear to not
+			 * be visible to everyone when
+			 * GetOldestNonRemovableTransactionId() returns a conservative
+			 * value that's older than the real safe xmin. That is not
+			 * corruption -- the PD_ALL_VISIBLE flag is still correct.
+			 *
+			 * However, dead tuple versions, in-progress inserts, and
+			 * in-progress deletes should never appear on a page marked
+			 * all-visible. That indicates real corruption. PD_ALL_VISIBLE
+			 * should have been cleared by the DML operation that deleted or
+			 * inserted the tuple.
+			 */
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+			break;
+
+		case VM_CORRUPT_MISSING_PAGE_HINT:
+
+			/*
+			 * As of PostgreSQL 9.2, the visibility map bit should never be
+			 * set if the page-level bit is clear. However, for vacuum, it's
+			 * possible that the bit got cleared after
+			 * heap_vac_scan_next_block() was called, so we must recheck now
+			 * that we have the buffer lock before concluding that the VM is
+			 * corrupt.
+			 */
+			Assert(!PageIsAllVisible(prstate->page));
+			Assert(prstate->old_vmbits & VISIBILITYMAP_VALID_BITS);
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("page is not marked all-visible but visibility map bit is set"),
+					 errcontext("relation \"%s\", page %u",
+								relname, prstate->block)));
+			break;
+
+		default:
+			elog(ERROR, "unrecognized VM corruption type: %d",
+				 (int) corruption_type);
+			break;
+	}
+
+	/*
+	 * Clear PD_ALL_VISIBLE on the heap page if it is set.
+	 * VM_CORRUPT_MISSING_PAGE_HINT is already clear by definition, so avoid
+	 * marking the buffer dirty.
+	 */
+	if (corruption_type != VM_CORRUPT_MISSING_PAGE_HINT)
+	{
+		Assert(PageIsAllVisible(prstate->page));
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+						VISIBILITYMAP_VALID_BITS);
+	prstate->old_vmbits = 0;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -830,6 +972,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	/*
+	 * If the VM is set but PD_ALL_VISIBLE is clear, fix that corruption
+	 * before pruning and freezing so that the page and VM start out in a
+	 * consistent state.
+	 */
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_page_fix_vm_corruption(&prstate, InvalidOffsetNumber,
+									VM_CORRUPT_MISSING_PAGE_HINT);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -973,6 +1125,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->set_all_visible = prstate.set_all_visible;
 	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->old_vmbits = prstate.old_vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1295,7 +1448,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1305,6 +1459,14 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_page_fix_vm_corruption(prstate, offnum,
+									VM_CORRUPT_TUPLE_VISIBILITY);
 }
 
 /* Record line pointer to be redirected */
@@ -1388,6 +1550,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_page_fix_vm_corruption(prstate, offnum, VM_CORRUPT_LPDEAD);
 }
 
 /* Record line pointer to be marked unused */
@@ -1527,7 +1698,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1542,6 +1714,11 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_page_fix_vm_corruption(prstate, offnum,
+											VM_CORRUPT_TUPLE_VISIBILITY);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1566,7 +1743,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1632,6 +1810,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_page_fix_vm_corruption(prstate, offnum, VM_CORRUPT_LPDEAD);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c57432670e7..56722556417 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -432,11 +432,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1989,81 +1984,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2095,6 +2015,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2204,18 +2125,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.set_all_visible || !(*has_lpdead_items));
 	Assert(!presult.set_all_frozen || presult.set_all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.set_all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.old_vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.set_all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2fdc50b865b..00134012137 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -262,6 +262,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -324,6 +330,12 @@ typedef struct PruneFreezeResult
 	bool		set_all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * The value of the vmbuffer's vmbits at the beginning of pruning. It is
+	 * cleared if VM corruption is found and corrected.
+	 */
+	uint8		old_vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a4a2ed07816..480614d483b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3273,6 +3273,7 @@ UserAuth
 UserContext
 UserMapping
 UserOpts
+VMCorruptionType
 VacAttrStats
 VacAttrStatsP
 VacDeadItemsInfo
-- 
2.43.0



  [text/x-patch] v41-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (7.7K, 3-v41-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From 0c9f91eec0127bc914c7fbe79256c6e5b689cde8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v41 02/12] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, we exit early. We can't
exit early if vacuum passed DISABLE_PAGE_SKIPPING, though.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 97 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 10 +++
 src/include/access/heapam.h          |  1 +
 3 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e452d25cae6..19a72ac6b27 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,12 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		old_vmbits;
 
+	/*
+	 * True if the page can bypass full page inspection during pruning and
+	 * freezing based on its visibility map status and the caller's options.
+	 */
+	bool		fast_path;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -201,6 +207,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneState *prstate);
 static void heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 										VMCorruptionType ctype);
+static void prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -329,7 +336,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			params.options = 0;
+			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -398,6 +405,16 @@ prune_freeze_setup(PruneFreezeParams *params,
 												   prstate->block,
 												   &prstate->vmbuffer);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can skip pruning and freezing entirely.
+	 * Callers must opt in by setting HEAP_PAGE_PRUNE_ALLOW_FAST_PATH.
+	 */
+	prstate->fast_path = ((prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN) ||
+						  ((prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE) &&
+						   !prstate->attempt_freeze)) &&
+		(params->options & HEAP_PAGE_PRUNE_ALLOW_FAST_PATH);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -913,6 +930,73 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 	prstate->old_vmbits = 0;
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->old_vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling prune_freeze_bypass(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	presult->old_vmbits = prstate->old_vmbits;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (PageIsEmpty(page))
+		return;
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+		{
+			/*
+			 * Now that we've found an actual tuple, set hastup. If the page
+			 * is entirely LP_UNUSED, we want vacuum to still truncate it.
+			 */
+			presult->hastup = true;
+			prstate->live_tuples++;
+		}
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -982,6 +1066,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		heap_page_fix_vm_corruption(&prstate, InvalidOffsetNumber,
 									VM_CORRUPT_MISSING_PAGE_HINT);
 
+	/*
+	 * If the visibility map status allows it, bypass pruning and freezing
+	 * entirely. This must be done after fixing any discrepancy between the
+	 * page-level visibility hint and the VM.
+	 */
+	if (prstate.fast_path)
+	{
+		prune_freeze_bypass(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 56722556417..1a446050d85 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2044,6 +2044,16 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
+	/*
+	 * Allow skipping full inspection of pages that the VM indicates are
+	 * already all-frozen (which may be scanned due to SKIP_PAGES_THRESHOLD).
+	 * However, if DISABLE_PAGE_SKIPPING was specified, we can't trust the VM,
+	 * so we must examine the page to make sure it is truly all-frozen and fix
+	 * it otherwise.
+	 */
+	if (vacrel->skipwithvm)
+		params.options |= HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+
 	heap_page_prune_and_freeze(&params,
 							   &presult,
 							   &vacrel->offnum,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 00134012137..305ecc31a9e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
-- 
2.43.0



  [text/x-patch] v41-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (12.3K, 4-v41-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 933ccfc1fa4f652c9a9f0be7cac5abebf4ddf7c1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v41 03/12] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page;
if it is globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid. This approach will result in examining more tuple
xmins than before; however, the additional cost should not be
significant. And doing so will enable us to set the visibility map on
access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 22 ++++++++++
 src/backend/access/heap/pruneheap.c         | 48 ++++++++++-----------
 src/backend/access/heap/vacuumlazy.c        | 48 +++++++++++++--------
 src/include/access/heapam.h                 |  2 +
 4 files changed, 79 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..d70fab3a763 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 19a72ac6b27..f437579076e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -166,10 +166,13 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * is used after processing all tuples to determine if the page can be
+	 * considered all-visible (if the newest xmin is still considered running
+	 * by some snapshot, it cannot be). It is also used by the caller as the
+	 * conflict horizon when setting the VM bits, unless we froze all tuples
+	 * on the page (in which case the conflict xid was already included in the
+	 * WAL record).
 	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
@@ -1085,6 +1088,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest,
+									 prstate.visibility_cutoff_xid))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1753,29 +1767,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1a446050d85..2a94ba3a387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -468,13 +468,13 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2089,7 +2089,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
@@ -2852,7 +2852,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3614,14 +3614,14 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3642,7 +3642,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3661,7 +3661,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3742,7 +3742,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3751,16 +3751,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3789,6 +3790,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 305ecc31a9e..f9dbd70c1c4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -480,6 +480,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v41-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (15.4K, 5-v41-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From de07645b084c2e01050ac5fa8a6c80240842673e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v41 04/12] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 137 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++---
 2 files changed, 72 insertions(+), 95 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f437579076e..9451e9417f7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		old_vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*
 	 * True if the page can bypass full page inspection during pruning and
 	 * freezing based on its visibility map status and the caller's options.
@@ -166,14 +169,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
-	 * is used after processing all tuples to determine if the page can be
-	 * considered all-visible (if the newest xmin is still considered running
-	 * by some snapshot, it cannot be). It is also used by the caller as the
-	 * conflict horizon when setting the VM bits, unless we froze all tuples
-	 * on the page (in which case the conflict xid was already included in the
-	 * WAL record).
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -183,7 +178,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 
@@ -471,53 +465,42 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. We must initialize set_all_frozen based on whether or not the
+	 * caller passed HEAP_PAGE_PRUNE_FREEZE, because if they did not, we won't
+	 * call heap_prepare_freeze_tuple() for each tuple, and set_all_frozen
+	 * will never be cleared for tuples that need freezing.
+	 *
+	 * When freezing is not required (no XIDs/MXIDs older than the freeze
+	 * cutoff), we may still choose to "opportunistically" freeze if doing so
+	 * would make the page all-frozen.
+	 *
+	 * We will not be able to freeze the whole page at the end of vacuum if
+	 * there are tuples present that are not visible to everyone or if there
+	 * are dead tuples which will not be removable. However, dead tuples that
+	 * will be removed by the end of vacuum should not prevent this
+	 * opportunistic freezing.
+	 *
+	 * Therefore, we do not clear set_all_visible and set_all_frozen when we
+	 * encounter LP_DEAD items. Instead, we correct them after deciding
+	 * whether to freeze, but before updating the VM, to avoid setting the VM
+	 * bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze;
 }
 
 /*
@@ -747,7 +730,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -1021,9 +1003,8 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1094,9 +1075,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidMaybeRunning(prstate.vistest,
-									 prstate.visibility_cutoff_xid))
+									 prstate.newest_live_xid))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	/*
@@ -1247,7 +1228,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1708,6 +1689,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1755,32 +1737,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2a94ba3a387..8599dd7fcfa 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -470,7 +470,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -478,7 +478,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2828,7 +2828,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2854,14 +2854,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2902,7 +2902,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3616,7 +3616,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 
@@ -3624,7 +3624,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3647,7 +3647,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3665,7 +3665,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3675,7 +3675,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3764,9 +3764,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3796,8 +3796,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *newest_live_xid))
 	{
 		all_visible = false;
 		*all_frozen = false;
-- 
2.43.0



  [text/x-patch] v41-0005-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch (23.1K, 6-v41-0005-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch)
  download | inline diff:
From b6975991b391b979bbffac9cc0bd8896ba181ba8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v41 05/12] WAL log VM setting during vacuum phase I in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

For now, this change applies only to vacuum phase I, not to pruning
performed during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 244 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 113 ++-----------
 src/include/access/heapam.h          |  37 ++--
 3 files changed, 204 insertions(+), 190 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9451e9417f7..dd8ac173ca1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
 	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -163,21 +182,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 
@@ -232,6 +236,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 
 /*
@@ -398,6 +403,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
+	prstate->new_vmbits = 0;
 	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
 												   prstate->block,
 												   &prstate->vmbuffer);
@@ -915,6 +921,42 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 	prstate->old_vmbits = 0;
 }
 
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * set the VM for now.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * If the page is already all-frozen, or already all-visible and freezing
  * is not being attempted, there is no remaining work and we can bypass the
@@ -948,8 +990,6 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->old_vmbits = prstate->old_vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -984,7 +1024,8 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -999,12 +1040,10 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also passed and if the page is
+ * found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1032,8 +1071,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1125,6 +1166,31 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	conflict_xid = InvalidTransactionId;
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1146,14 +1212,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1167,6 +1236,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1174,29 +1264,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1206,33 +1279,70 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.newest_live_xid);
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.set_all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->old_vmbits = prstate.old_vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->newly_all_visible = false;
+	presult->newly_all_frozen = false;
+	presult->newly_all_visible_frozen = false;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->newly_all_visible = true;
+			if (prstate.set_all_frozen)
+				presult->newly_all_visible_frozen = true;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->newly_all_frozen = true;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8599dd7fcfa..d144e0f642b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -466,13 +466,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
@@ -2021,8 +2014,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2073,32 +2064,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2119,6 +2084,17 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	if (presult.newly_all_visible)
+		vacrel->new_all_visible_pages++;
+	if (presult.newly_all_visible_frozen)
+		vacrel->new_all_visible_all_frozen_pages++;
+	if (presult.newly_all_frozen)
+		vacrel->new_all_frozen_pages++;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.newly_all_visible_frozen ||
+		presult.newly_all_frozen;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2132,71 +2108,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.old_vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3612,7 +3523,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f9dbd70c1c4..b9577c24844 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -265,7 +265,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -281,8 +282,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -316,26 +316,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * The value of the vmbuffer's vmbits at the beginning of pruning. It is
-	 * cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		old_vmbits;
+	bool		newly_all_visible;
+	bool		newly_all_visible_frozen;
+	bool		newly_all_frozen;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -472,6 +458,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v41-0006-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch (2.7K, 7-v41-0006-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch)
  download | inline diff:
From 561637633c1417af7dea0509bbbf55dca3c2fead Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v41 06/12] WAL log VM setting for empty pages in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

This has no independent benefit, but empty pages were the last user of
XLOG_HEAP2_VISIBLE, so if we make this change we can remove all of the
XLOH_HEAP2_VISIBLE code.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d144e0f642b..de93bff4a8e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1928,9 +1928,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1948,13 +1951,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v41-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (27.5K, 8-v41-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 8c63ade694ce8ed5bcb2d67c15203f0bd41d6b3f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v41 07/12] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   5 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 150 +++++-----------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |   9 +-
 src/include/access/heapam_xlog.h         |  21 +--
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 63 insertions(+), 386 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 8a67bfa1aff..d64c403f2f0 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -55,9 +55,8 @@ mask_page_hint_bits(Page page)
 	PageClearHasFreeLinePointers(page);
 
 	/*
-	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * XXX: We should consider not masking PD_ALL_VISIBLE during WAL
+	 * consistency checking.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..044f385e477 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8886,50 +8886,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1da774c1536..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1369,9 +1232,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dd8ac173ca1..fac7194dcba 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1253,8 +1253,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index de93bff4a8e..461fdf4ed83 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1951,11 +1951,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2833,9 +2833,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..21e89c38f0a 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -35,21 +34,31 @@
  * is set, we know the condition is true, but if a bit is not set, it might or
  * might not be true.
  *
- * Clearing visibility map bits is not separately WAL-logged.  The callers
- * must make sure that whenever a bit is cleared, the bit is cleared on WAL
- * replay of the updating operation as well.
- *
- * When we *set* a visibility map during VACUUM, we must write WAL.  This may
- * seem counterintuitive, since the bit is basically a hint: if it is clear,
- * it may still be the case that every tuple on the page is visible to all
- * transactions; we just don't know that for certain.  The difficulty is that
- * there are two bits which are typically set together: the PD_ALL_VISIBLE bit
- * on the page itself, and the visibility map bit.  If a crash occurs after the
- * visibility map page makes it to disk and before the updated heap page makes
- * it to disk, redo must set the bit on the heap page.  Otherwise, the next
- * insert, update, or delete on the heap page will fail to realize that the
- * visibility map bit must be cleared, possibly causing index-only scans to
- * return wrong answers.
+ * Changes to the visibility map bits are not separately WAL-logged. Callers
+ * must make sure that whenever a visibility map bit is cleared, the bit is
+ * cleared on WAL replay of the updating operation. And whenever a visibility
+ * map bit is set, the bit is set on WAL replay of the operation that rendered
+ * the page all-visible/all-frozen.
+ *
+ * The visibility map bits operate as a hint in one direction: if they are
+ * clear, it may still be the case that every tuple on the page is visible to
+ * all transactions (we just don't know that for certain). However, if they
+ * are set, we may skip vacuuming pages and incorrectly advance relfrozenxid
+ * or skip reading heap pages for an index-only scan and return wrong results.
+ *
+ * Additionally, it is critical that the heap-page level PD_ALL_VISIBLE bit be
+ * correctly set and cleared along with the VM bits.
+ *
+ * When clearing the VM, if a crash occurs after the heap page makes it to
+ * disk but before the VM page makes it to disk, replay must clear the VM or
+ * the next index-only scan can return wrong results or vacuum may incorrectly
+ * advance relfrozenxid.
+ *
+ * When setting the VM, if a crash occurs after the visibility map page makes
+ * it to disk and before the updated heap page makes it to disk, redo must set
+ * the bit on the heap page. Otherwise, the next insert, update, or delete on
+ * the heap page will fail to realize that the visibility map bit must be
+ * cleared, possibly causing index-only scans to return wrong answers.
  *
  * VACUUM will normally skip pages for which the visibility map bit is set;
  * such pages can't contain any dead tuples and therefore don't need vacuuming.
@@ -222,112 +231,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +251,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..de9092fdf5b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -476,10 +476,11 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	/*
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
-	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
+	 * This can happen whenever the changes in the WAL record do not affect
+	 * visibility on a standby. For example: a record that only freezes an
+	 * xmax from a locker.
+	 *
+	 * It's also quite common with records generated during index deletion
 	 * (original execution of the deletion can reason that a recovery conflict
 	 * which is sufficient for the deletion operation must take place before
 	 * replay of the deletion record itself).
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..516806fcca2 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,7 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
+/* 0x40 was XLOG_HEAP2_VISIBLE */
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +443,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +486,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 480614d483b..f12f2deec43 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4416,7 +4416,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v41-0008-Track-which-relations-are-modified-by-a-query.patch (8.7K, 9-v41-0008-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From f5de33c173ca5216ea042475ec8ee2d8f0ddd3a9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v41 08/12] Track which relations are modified by a query

Save the relids of modified relations in a bitmap in the PlannedStmt.
A later commit will pass this information down to scan nodes to control
whether or not on-access pruning is allowed to set the visibility map.
Setting the visibility map during a scan is counterproductive if the
query is going to modify the page immediately after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this bitmap is only used as a hint
to avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c        | 47 ++++++++++++++++++++++++++
 src/backend/executor/execParallel.c    |  1 +
 src/backend/executor/nodeLockRows.c    |  4 +++
 src/backend/executor/nodeModifyTable.c | 18 ++++++++++
 src/backend/optimizer/plan/planner.c   | 21 +++++++++++-
 src/include/nodes/plannodes.h          | 10 ++++++
 6 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..3f134f9a34d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -90,6 +90,9 @@ static bool ExecCheckPermissionsModified(Oid relOid, Oid userid,
 										 Bitmapset *modifiedCols,
 										 AclMode requiredPerms);
 static void ExecCheckXactReadOnly(PlannedStmt *plannedstmt);
+#ifdef USE_ASSERT_CHECKING
+static void ExecCheckModifiedRelIds(EState *estate);
+#endif
 static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 static void ReportNotNullViolationError(ResultRelInfo *resultRelInfo,
 										TupleTableSlot *slot,
@@ -827,6 +830,46 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
 }
 
 
+/*
+ * ExecCheckModifiedRelIds
+ *		Verify that every relation the executor actually opened for modification
+ *		or row locking is present in the planner's modifiedRelids.
+ *
+ * The planner's set may be a superset of what the executor touches, because it
+ * includes partitions that were pruned at runtime and parent row marks that the
+ * executor skips.
+ */
+#ifdef USE_ASSERT_CHECKING
+static void
+ExecCheckModifiedRelIds(EState *estate)
+{
+	PlannedStmt *plannedstmt = estate->es_plannedstmt;
+	Bitmapset  *executor_relids = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = (ResultRelInfo *) lfirst(lc);
+
+		if (rri->ri_RangeTableIndex != 0)
+			executor_relids = bms_add_member(executor_relids,
+											 rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (int i = 0; i < estate->es_range_table_size; i++)
+		{
+			if (estate->es_rowmarks[i] != NULL)
+				executor_relids = bms_add_member(executor_relids,
+												 estate->es_rowmarks[i]->rti);
+		}
+	}
+	Assert(bms_is_subset(executor_relids, plannedstmt->modifiedRelids));
+	bms_free(executor_relids);
+}
+#endif
+
+
 /* ----------------------------------------------------------------
  *		InitPlan
  *
@@ -992,6 +1035,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	ExecCheckModifiedRelIds(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..4f39767d033 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->modifiedRelids = estate->es_plannedstmt->modifiedRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 8d865470780..d67f24fca8c 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -113,6 +113,10 @@ lnext:
 		}
 		erm->ermActive = true;
 
+		/* verify this relation is in the planner's modifiedRelids */
+		Assert(bms_is_member(erm->rti,
+							 estate->es_plannedstmt->modifiedRelids));
+
 		/* fetch the tuple's ctid */
 		datum = ExecGetJunkAttribute(slot,
 									 aerm->ctidAttNo,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..6b4ee4f9378 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -896,6 +896,16 @@ ExecInsert(ModifyTableContext *context,
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
+	/*
+	 * Verify this relation is in the planner's set of modified relations.
+	 * Partitions opened by tuple routing have ri_RangeTableIndex == 0 because
+	 * they have no range table entry, so we can only check relations that are
+	 * in the range table.
+	 */
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	/*
 	 * Open the table's indexes, if we have not done so already, so that we
 	 * can add new index entries for the inserted tuple.
@@ -1523,6 +1533,10 @@ ExecDeleteAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	EState	   *estate = context->estate;
 
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	return table_tuple_delete(resultRelInfo->ri_RelationDesc, tupleid,
 							  estate->es_output_cid,
 							  estate->es_snapshot,
@@ -2205,6 +2219,10 @@ ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	bool		partition_constraint_failed;
 	TM_Result	result;
 
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	updateCxt->crossPartUpdate = false;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..847af979e31 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	Bitmapset  *modifiedRelids = NULL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +663,23 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute modifiedRelids from result relations and row marks.  This is a
+	 * superset of what the executor will actually modify/lock at runtime,
+	 * because runtime partition pruning may eliminate some result relations,
+	 * and parent row marks are included here but skipped by the executor.
+	 */
+	foreach(lc, glob->resultRelations)
+		modifiedRelids = bms_add_member(modifiedRelids, lfirst_int(lc));
+	foreach(lc, glob->finalrowmarks)
+	{
+		PlanRowMark *rc = (PlanRowMark *) lfirst(lc);
+
+		modifiedRelids = bms_add_member(modifiedRelids, rc->rti);
+	}
+	result->modifiedRelids = modifiedRelids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..841c7707c59 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -112,6 +112,16 @@ typedef struct PlannedStmt
 	 */
 	Bitmapset  *unprunableRelids;
 
+	/*
+	 * RT indexes of relations modified by the query through
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by SELECT FOR UPDATE/SHARE.
+	 *
+	 * Computed by the planner, this is a superset of what the executor will
+	 * actually touch at runtime, because it includes partitions that may be
+	 * pruned and parent row marks that the executor skips.
+	 */
+	Bitmapset  *modifiedRelids;
+
 	/*
 	 * list of RTEPermissionInfo nodes for rtable entries needing one
 	 */
-- 
2.43.0



  [text/x-patch] v41-0009-Thread-flags-through-begin-scan-APIs.patch (32.9K, 10-v41-0009-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From a2aca8dcd8e944058ded737184343a6960f7cee6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v41 09/12] Thread flags through begin-scan APIs

Add a user-settable flags parameter to the table_beginscan_* wrappers,
index_beginscan(), table_index_fetch_begin(), and the table
AM callback index_fetch_begin(). This allows users to pass additional
context to be used when building the scan descriptors.

For index scans, a new uint32 flags field is added to
IndexFetchTableData, and the heap AM stores the caller-provided flags
there in heapam_index_fetch_begin().

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  9 ++-
 src/backend/access/index/genam.c          |  2 +
 src/backend/access/index/indexam.c        |  7 ++-
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 21 +++----
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +--
 src/backend/commands/typecmds.c           |  4 +-
 src/backend/executor/execIndexing.c       |  3 +-
 src/backend/executor/execReplication.c    | 12 ++--
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  5 +-
 src/backend/executor/nodeIndexscan.c      |  6 +-
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  6 +-
 src/backend/executor/nodeTidrangescan.c   |  6 +-
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  1 +
 src/include/access/genam.h                |  4 +-
 src/include/access/heapam.h               |  5 +-
 src/include/access/relscan.h              |  1 +
 src/include/access/tableam.h              | 72 +++++++++++++++--------
 26 files changed, 117 insertions(+), 75 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..0556e9f7b88 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..b25e814a996 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..555b16771e9 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 253a735b6c1..66726b22de6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -80,11 +80,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -762,7 +763,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									0,	/* flags */
+									SnapshotAny, NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -771,7 +774,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..b099d956e41 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,6 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
+										 0, /* flags */
 										 snapshot, NULL, nkeys, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
@@ -716,6 +717,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
+									 0, /* flags */
 									 snapshot, NULL, nkeys, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 5eb7e99ad3e..63d5daadca6 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -255,6 +255,7 @@ index_insert_cleanup(Relation indexRelation,
 IndexScanDesc
 index_beginscan(Relation heapRelation,
 				Relation indexRelation,
+				uint32 flags,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
 				int nkeys, int norderbys)
@@ -284,7 +285,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -593,7 +594,7 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -615,7 +616,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..016a5e546dd 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..7a12e808b07 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, 0);
 }
 
 
@@ -163,10 +163,10 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +176,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +185,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +207,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +216,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 499ce9ad3db..fb791c7990b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1160,7 +1160,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 67e42e5df29..cc2ec9393a8 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,7 +13980,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22881,7 +22881,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23345,7 +23345,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..c46beedeb71 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,8 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, 0,	/* flags */
+								 &DirtySnapshot, NULL, indnkeyatts, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..23509771557 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,9 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   &snap, NULL, skey_attoff, 0);
 
 retry:
 	found = false;
@@ -383,7 +385,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +604,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +668,9 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   SnapshotAny, NULL, skey_attoff, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..324e2bed22c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c8db357e69f..decfd792809 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -92,6 +92,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -790,7 +791,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -856,7 +857,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index bd83e4712b3..a37fa9abece 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -110,6 +110,7 @@ IndexNext(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -206,6 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1726,7 +1728,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1790,7 +1792,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cc6b23abee0 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,7 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode, 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..c2d9b7293de 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..994f70989bc 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,7 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid, 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +460,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +494,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 86b55c9bb8b..1d64d286881 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7177,6 +7177,7 @@ get_actual_variable_endpoint(Relation heapRel,
 							  GlobalVisTestFor(heapRel));
 
 	index_scan = index_beginscan(heapRel, indexRel,
+								 0, /* flags */
 								 &SnapshotNonVacuumable, NULL,
 								 1, 0);
 	/* Set it up for index-only scan */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..b98c20a0edc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,6 +156,7 @@ extern void index_insert_cleanup(Relation indexRelation,
 
 extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
+									 uint32 flags,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
@@ -184,7 +185,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b9577c24844..e32f28d7acb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..80ea0b437d1 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,7 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..8357d05d83b 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,16 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table_beginscan_* functions
+ * and must not be passed by callers.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -420,7 +430,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +881,18 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -894,12 +910,13 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +945,7 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags, 0);
 }
 
 /*
@@ -939,11 +956,12 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -957,18 +975,19 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1000,7 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -994,7 +1013,7 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -1059,12 +1078,13 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1139,7 +1159,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1170,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1175,8 +1197,10 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1209,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v41-0010-Pass-down-information-on-table-modification-to-s.patch (11.3K, 11-v41-0010-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 512c2a19651aecab3a15712992586dce288b83fd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v41 10/12] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/nodeBitmapHeapscan.c |  6 +++++-
 src/backend/executor/nodeIndexonlyscan.c  | 15 ++++++++++++---
 src/backend/executor/nodeIndexscan.c      | 18 ++++++++++++++----
 src/backend/executor/nodeSamplescan.c     |  5 ++++-
 src/backend/executor/nodeSeqscan.c        | 18 +++++++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 15 ++++++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           | 10 ++++++++++
 8 files changed, 75 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 324e2bed22c..aec92c868ac 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -144,11 +144,15 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index decfd792809..b977719c295 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -85,6 +85,9 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -92,7 +95,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -791,7 +794,10 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -857,7 +863,10 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a37fa9abece..ad460c11679 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -104,13 +104,16 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -201,13 +204,16 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1728,7 +1734,9 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1792,7 +1800,9 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cc6b23abee0..71c70e5e5c7 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -292,13 +292,16 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&scanstate->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode, 0);
+									 scanstate->use_pagemode, flags);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index c2d9b7293de..79470e6b9b5 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,17 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -368,14 +372,18 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -405,8 +413,12 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 994f70989bc..4257afd96ed 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,10 +242,13 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+				SO_HINT_REL_READ_ONLY : 0;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid, 0);
+												&node->trss_maxtid, flags);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -452,15 +455,18 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
 
 /* ----------------------------------------------------------------
@@ -490,9 +496,12 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8357d05d83b..487e38292fa 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..31c4192b67e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,16 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+/*
+ * Return true if the scan node's relation is not modified by the query.
+ */
+static inline bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	return !bms_is_member(((Scan *) ss->ps.plan)->scanrelid,
+						  ss->ps.state->es_plannedstmt->modifiedRelids);
+}
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v41-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.1K, 12-v41-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From f02a7e880af91c4fb14edc75076d448f00462270 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v41 11/12] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they requrie pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++--
 src/backend/access/heap/pruneheap.c      | 46 +++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              | 16 +++++++--
 5 files changed, 54 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 66726b22de6..651efa0127a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -148,7 +148,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2545,7 +2546,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fac7194dcba..deb7b948c1e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -236,7 +238,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -257,7 +260,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -339,6 +343,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -395,6 +401,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -474,9 +481,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -928,21 +934,37 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1168,7 +1190,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 461fdf4ed83..37dba4cb3ec 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2033,7 +2033,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e32f28d7acb..78c85536d39 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -95,7 +96,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
+	 */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
@@ -126,7 +132,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
@@ -431,7 +441,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v41-0012-Set-pd_prune_xid-on-insert.patch (8.8K, 13-v41-0012-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 99c5b17c45e413680919f2623fc528bb35873dc9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v41 12/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to set the VM all-visible
after a page is filled with newly inserted tuples the first time it is
read. This means the page will get set all-visible when it is still in
shared buffers and avoid potential I/O amplification when vacuum later
has to scan the page and set it all-visible. It also enables index-only
scans of newly inserted data much sooner.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 40 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++------
 3 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..ba11bbc03a5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index deb7b948c1e..9f8c83aa7d3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -279,7 +279,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1918,17 +1919,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-20 23:37  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-03-20 23:37 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Thu, Mar 19, 2026 at 10:38 PM Melanie Plageman
<[email protected]> wrote:
>
> Thanks for the detailed review! Unless otherwise specified, attached
> v41 includes all of your straightforward review points.

I've made several minor updates and two notable updates in attached v42:

- no separate log_newpage_buffer() for empty page vacuum.
log_heap_prune_and_freeze() now handles pages without a valid LSN on
its own
- the heap_page_is_all_visible() assertion should be stable even once
it uses GlobalVisState because I've updated the GloablVisState
functions to avoid updating the GlobalVisState boundaries in this case

- Melanie


Attachments:

  [text/x-patch] v42-0001-Fix-visibility-map-corruption-in-more-cases.patch (20.5K, 2-v42-0001-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From 866d7257a7024a018d1c39b09c0026bea374f8f6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v42 01/12] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 217 +++++++++++++++++++++++++--
 src/backend/access/heap/vacuumlazy.c |  89 +----------
 src/include/access/heapam.h          |  12 ++
 src/tools/pgindent/typedefs.list     |   1 +
 4 files changed, 217 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8d9f0694206..be3ae21f94c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -114,6 +114,21 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		old_vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -162,12 +177,30 @@ typedef struct
 	TransactionId visibility_cutoff_xid;
 } PruneState;
 
+
+/*
+ * Type of visibility map corruption detected on a heap page. Passed to
+ * heap_page_fix_vm_corruption() so the caller can specify what it found
+ * rather than having the function rederive the corruption from page state.
+ */
+typedef enum VMCorruptionType
+{
+	/* VM bits are set but the page-level PD_ALL_VISIBLE flag is not */
+	VM_CORRUPT_MISSING_PAGE_HINT,
+	/* LP_DEAD line pointers found on a page marked all-visible */
+	VM_CORRUPT_LPDEAD,
+	/* Tuple not visible to all transactions on a page marked all-visible */
+	VM_CORRUPT_TUPLE_VISIBILITY,
+} VMCorruptionType;
+
 /* Local functions */
 static void prune_freeze_setup(PruneFreezeParams *params,
 							   TransactionId *new_relfrozen_xid,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
+										VMCorruptionType ctype);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -175,7 +208,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -209,8 +243,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  *
  * This function may pin *vmbuffer. It's passed by reference so the caller can
- * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
- * responsible for unpinning it.
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
+ * VM corruption during pruning, we will fix it. Caller is responsible for
+ * unpinning *vmbuffer.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -277,6 +312,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -284,14 +329,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -354,6 +392,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -770,6 +814,106 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Emit a warning about and fix visibility map corruption on the given page.
+ *
+ * The caller specifies the type of corruption it has already detected via
+ * corruption_type, so that we can emit the appropriate warning. All cases
+ * result in the VM bits being cleared; page-level corruption types also clear
+ * PD_ALL_VISIBLE.
+ *
+ * Must be called while holding an exclusive lock on the heap buffer. Dead
+ * items must have been discovered under that same lock. Although we do not
+ * hold a lock on the VM buffer, it is pinned, and the heap buffer is
+ * exclusively locked, ensuring that no other backend can update the VM bits
+ * corresponding to this heap page.
+ *
+ * This function makes changes to the VM and, potentially, the heap page, but
+ * it does not need to be done in a critical section.
+ */
+static void
+heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
+							VMCorruptionType corruption_type)
+{
+	const char *relname = RelationGetRelationName(prstate->relation);
+	bool		clear_vm = false;
+	bool		clear_heap = false;
+
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	switch (corruption_type)
+	{
+		case VM_CORRUPT_LPDEAD:
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("dead line pointer found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+			clear_vm = true;
+			break;
+
+		case VM_CORRUPT_TUPLE_VISIBILITY:
+
+			/*
+			 * A HEAPTUPLE_LIVE tuple on an all-visible page can appear to not
+			 * be visible to everyone when
+			 * GetOldestNonRemovableTransactionId() returns a conservative
+			 * value that's older than the real safe xmin. That is not
+			 * corruption -- the PD_ALL_VISIBLE flag is still correct.
+			 *
+			 * However, dead tuple versions, in-progress inserts, and
+			 * in-progress deletes should never appear on a page marked
+			 * all-visible. That indicates real corruption. PD_ALL_VISIBLE
+			 * should have been cleared by the DML operation that deleted or
+			 * inserted the tuple.
+			 */
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+			clear_vm = true;
+			break;
+
+		case VM_CORRUPT_MISSING_PAGE_HINT:
+
+			/*
+			 * As of PostgreSQL 9.2, the visibility map bit should never be
+			 * set if the page-level bit is clear. However, for vacuum, it's
+			 * possible that the bit got cleared after
+			 * heap_vac_scan_next_block() was called, so we must recheck now
+			 * that we have the buffer lock before concluding that the VM is
+			 * corrupt.
+			 */
+			Assert(!PageIsAllVisible(prstate->page));
+			Assert(prstate->old_vmbits & VISIBILITYMAP_VALID_BITS);
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("page is not marked all-visible but visibility map bit is set"),
+					 errcontext("relation \"%s\", page %u",
+								relname, prstate->block)));
+			clear_vm = true;
+			clear_heap = true;
+			break;
+	}
+
+	Assert(clear_heap || clear_vm);
+
+	/* Avoid marking the buffer dirty if PD_ALL_VISIBLE is already clear */
+	if (clear_heap)
+	{
+		Assert(PageIsAllVisible(prstate->page));
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (clear_vm)
+	{
+		visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		prstate->old_vmbits = 0;
+	}
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -830,6 +974,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	/*
+	 * If the VM is set but PD_ALL_VISIBLE is clear, fix that corruption
+	 * before pruning and freezing so that the page and VM start out in a
+	 * consistent state.
+	 */
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_page_fix_vm_corruption(&prstate, InvalidOffsetNumber,
+									VM_CORRUPT_MISSING_PAGE_HINT);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -973,6 +1127,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->set_all_visible = prstate.set_all_visible;
 	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->old_vmbits = prstate.old_vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1295,7 +1450,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1305,6 +1461,14 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_page_fix_vm_corruption(prstate, offnum,
+									VM_CORRUPT_TUPLE_VISIBILITY);
 }
 
 /* Record line pointer to be redirected */
@@ -1388,6 +1552,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_page_fix_vm_corruption(prstate, offnum, VM_CORRUPT_LPDEAD);
 }
 
 /* Record line pointer to be marked unused */
@@ -1527,7 +1700,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1542,6 +1716,11 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_page_fix_vm_corruption(prstate, offnum,
+											VM_CORRUPT_TUPLE_VISIBILITY);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1566,7 +1745,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1632,6 +1812,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_page_fix_vm_corruption(prstate, offnum, VM_CORRUPT_LPDEAD);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c57432670e7..56722556417 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -432,11 +432,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1989,81 +1984,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2095,6 +2015,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2204,18 +2125,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.set_all_visible || !(*has_lpdead_items));
 	Assert(!presult.set_all_frozen || presult.set_all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.set_all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.old_vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.set_all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2fdc50b865b..00134012137 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -262,6 +262,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -324,6 +330,12 @@ typedef struct PruneFreezeResult
 	bool		set_all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * The value of the vmbuffer's vmbits at the beginning of pruning. It is
+	 * cleared if VM corruption is found and corrected.
+	 */
+	uint8		old_vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0042c33fa66..0c07c945f05 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3277,6 +3277,7 @@ UserAuth
 UserContext
 UserMapping
 UserOpts
+VMCorruptionType
 VacAttrStats
 VacAttrStatsP
 VacDeadItemsInfo
-- 
2.43.0



  [text/x-patch] v42-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (7.5K, 3-v42-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From 4bd5502d07562a0e6ce5cbf315833c5baa676028 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v42 02/12] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, exit early. We can't exit
early if vacuum passed DISABLE_PAGE_SKIPPING, though.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 97 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 10 +++
 src/include/access/heapam.h          |  1 +
 3 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index be3ae21f94c..22f2d9d9798 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,12 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		old_vmbits;
 
+	/*
+	 * True if the page can bypass full page inspection during pruning and
+	 * freezing based on its visibility map status and the caller's options.
+	 */
+	bool		fast_path;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -201,6 +207,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneState *prstate);
 static void heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 										VMCorruptionType ctype);
+static void prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -329,7 +336,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			params.options = 0;
+			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -398,6 +405,16 @@ prune_freeze_setup(PruneFreezeParams *params,
 												   prstate->block,
 												   &prstate->vmbuffer);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can skip pruning and freezing entirely.
+	 * Callers must opt in by setting HEAP_PAGE_PRUNE_ALLOW_FAST_PATH.
+	 */
+	prstate->fast_path = ((prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN) ||
+						  ((prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE) &&
+						   !prstate->attempt_freeze)) &&
+		(params->options & HEAP_PAGE_PRUNE_ALLOW_FAST_PATH);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -915,6 +932,73 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 	}
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->old_vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling prune_freeze_bypass(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	presult->old_vmbits = prstate->old_vmbits;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (PageIsEmpty(page))
+		return;
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		ItemId		lp = PageGetItemId(page, off);
+
+		if (!ItemIdIsUsed(lp))
+			continue;
+
+		presult->hastup = true;
+
+		if (ItemIdIsNormal(lp))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -984,6 +1068,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		heap_page_fix_vm_corruption(&prstate, InvalidOffsetNumber,
 									VM_CORRUPT_MISSING_PAGE_HINT);
 
+	/*
+	 * If the visibility map status allows it, bypass pruning and freezing
+	 * entirely. This must be done after fixing any discrepancy between the
+	 * page-level visibility hint and the VM.
+	 */
+	if (prstate.fast_path)
+	{
+		prune_freeze_bypass(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 56722556417..1a446050d85 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2044,6 +2044,16 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
+	/*
+	 * Allow skipping full inspection of pages that the VM indicates are
+	 * already all-frozen (which may be scanned due to SKIP_PAGES_THRESHOLD).
+	 * However, if DISABLE_PAGE_SKIPPING was specified, we can't trust the VM,
+	 * so we must examine the page to make sure it is truly all-frozen and fix
+	 * it otherwise.
+	 */
+	if (vacrel->skipwithvm)
+		params.options |= HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+
 	heap_page_prune_and_freeze(&params,
 							   &presult,
 							   &vacrel->offnum,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 00134012137..305ecc31a9e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
-- 
2.43.0



  [text/x-patch] v42-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (18.2K, 4-v42-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From e0d1a4724fcef6826bdd86c7fc2d068641624b5f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v42 03/12] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility to
determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen
at the start of vacuum (OldestXmin). We now use GlobalVisState, which
also enables future work to set the VM during on-access pruning, since
ordinary queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page; if it is
globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid, which is required to set the visibility map
on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 30 ++++++++++-
 src/backend/access/heap/pruneheap.c         | 53 +++++++++---------
 src/backend/access/heap/vacuumlazy.c        | 60 ++++++++++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 19 ++++---
 src/include/access/heapam.h                 |  2 +
 src/include/utils/snapmgr.h                 |  4 +-
 7 files changed, 115 insertions(+), 55 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..c678f5a3c8f 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,31 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * If allow_update is false, the GlobalVisState boundaries will not be updated.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidConsideredRunning(GlobalVisState *state, TransactionId xid,
+								  bool allow_update)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid, allow_update);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
@@ -1354,7 +1379,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after, true))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1420,7 +1445,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
 	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+									   HeapTupleHeaderGetRawXmax(tuple),
+									   true);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 22f2d9d9798..f9db97a6edf 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -166,10 +166,13 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * is used after processing all tuples to determine if the page can be
+	 * considered all-visible (if the newest xmin is still considered running
+	 * by some snapshot, it cannot be). It is also used by the caller as the
+	 * conflict horizon when setting the VM bits, unless we froze all tuples
+	 * on the page (in which case the conflict xid was already included in the
+	 * WAL record).
 	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
@@ -285,7 +288,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid, true))
 		return;
 
 	/*
@@ -1087,6 +1090,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidConsideredRunning(prstate.vistest,
+										  prstate.visibility_cutoff_xid,
+										  true))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1289,7 +1304,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup)
 	 * if the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after, true))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1755,29 +1770,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1a446050d85..797973d7bd0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -468,13 +468,14 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
+										   bool allow_update_vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2089,7 +2090,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
@@ -2852,7 +2853,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest, true,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3614,14 +3615,19 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
-
+	/*
+	 * Pass allow_update_vistest as false so that the GlobalVisState
+	 * boundaries used here match those used by the pruning code we are
+	 * cross-checking. Allowing an update could move the boundaries between
+	 * the two calls, causing a spurious assertion failure.
+	 */
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest, false,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3642,7 +3648,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility. If allow_update_vistest is true,
+ * the boundaries of the GlobalVisState may be updated when checking the
+ * visibility of the newest live XID on the page.
  *
  * Output parameters:
  *
@@ -3661,7 +3669,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
+							   bool allow_update_vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3742,7 +3751,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3751,16 +3760,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3789,6 +3799,20 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidConsideredRunning(vistest, *visibility_cutoff_xid,
+										  allow_update_vistest))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 6b7117b56b2..c461f8dc02d 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisTestIsRemovableXid(vistest, dt->xid, true)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 0f913897acc..27e5adeebfb 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4223,11 +4223,17 @@ GlobalVisUpdate(void)
  * The state passed needs to have been initialized for the relation fxid is
  * from (NULL is also OK), otherwise the result may not be correct.
  *
+ * If allow_update is false, the GlobalVisState boundaries will not be updated
+ * even if it would otherwise be beneficial. This is useful for callers that
+ * do not want GlobalVisState to advance at all, for example because they need
+ * a conservative answer based on the current boundaries.
+ *
  * See comment for GlobalVisState for details.
  */
 bool
 GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+								FullTransactionId fxid,
+								bool allow_update)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4248,7 +4254,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 	 * might not exist a snapshot considering fxid running. If it makes sense,
 	 * update boundaries and recheck.
 	 */
-	if (GlobalVisTestShouldUpdate(state))
+	if (allow_update && GlobalVisTestShouldUpdate(state))
 	{
 		GlobalVisUpdate();
 
@@ -4268,7 +4274,8 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid,
+							bool allow_update)
 {
 	FullTransactionId fxid;
 
@@ -4282,7 +4289,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisTestIsRemovableFullXid(state, fxid, allow_update);
 }
 
 /*
@@ -4296,7 +4303,7 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisTestIsRemovableFullXid(state, fxid, true);
 }
 
 /*
@@ -4310,7 +4317,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisTestIsRemovableXid(state, xid, true);
 }
 
 /*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 305ecc31a9e..8815acccafb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -480,6 +480,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidConsideredRunning(GlobalVisState *state, TransactionId xid, bool allow_update);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 8c919d2640e..db903709c49 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -115,8 +115,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid, bool allow_update);
+extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid, bool allow_update);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v42-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (15.5K, 5-v42-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From df7b68460f9b6166e7179640161c3452c712d2a5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v42 04/12] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 137 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++---
 2 files changed, 72 insertions(+), 95 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f9db97a6edf..fd5ff4e4e0a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		old_vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*
 	 * True if the page can bypass full page inspection during pruning and
 	 * freezing based on its visibility map status and the caller's options.
@@ -166,14 +169,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
-	 * is used after processing all tuples to determine if the page can be
-	 * considered all-visible (if the newest xmin is still considered running
-	 * by some snapshot, it cannot be). It is also used by the caller as the
-	 * conflict horizon when setting the VM bits, unless we froze all tuples
-	 * on the page (in which case the conflict xid was already included in the
-	 * WAL record).
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -183,7 +178,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 
@@ -471,53 +465,42 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. We must initialize set_all_frozen based on whether or not the
+	 * caller passed HEAP_PAGE_PRUNE_FREEZE, because if they did not, we won't
+	 * call heap_prepare_freeze_tuple() for each tuple, and set_all_frozen
+	 * will never be cleared for tuples that need freezing.
+	 *
+	 * When freezing is not required (no XIDs/MXIDs older than the freeze
+	 * cutoff), we may still choose to "opportunistically" freeze if doing so
+	 * would make the page all-frozen.
+	 *
+	 * We will not be able to freeze the whole page at the end of vacuum if
+	 * there are tuples present that are not visible to everyone or if there
+	 * are dead tuples which will not be removable. However, dead tuples that
+	 * will be removed by the end of vacuum should not prevent this
+	 * opportunistic freezing.
+	 *
+	 * Therefore, we do not clear set_all_visible and set_all_frozen when we
+	 * encounter LP_DEAD items. Instead, we correct them after deciding
+	 * whether to freeze, but before updating the VM, to avoid setting the VM
+	 * bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze;
 }
 
 /*
@@ -747,7 +730,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -1023,9 +1005,8 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1096,9 +1077,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidConsideredRunning(prstate.vistest,
-										  prstate.visibility_cutoff_xid,
+										  prstate.newest_live_xid,
 										  true))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
@@ -1250,7 +1231,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1711,6 +1692,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1758,32 +1740,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 797973d7bd0..696919e35dd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -470,7 +470,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -479,7 +479,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2829,7 +2829,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2855,14 +2855,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest, true,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2903,7 +2903,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3617,7 +3617,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 	/*
@@ -3630,7 +3630,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest, false,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3655,7 +3655,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3674,7 +3674,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3684,7 +3684,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3773,9 +3773,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3805,8 +3805,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidConsideredRunning(vistest, *visibility_cutoff_xid,
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidConsideredRunning(vistest, *newest_live_xid,
 										  allow_update_vistest))
 	{
 		all_visible = false;
-- 
2.43.0



  [text/x-patch] v42-0005-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch (23.1K, 6-v42-0005-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch)
  download | inline diff:
From 33e0c4824f7405ac5711bacda7c21af28a3ac0be Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v42 05/12] WAL log VM setting during vacuum phase I in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

For now, this change applies only to vacuum phase I, not to pruning
performed during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 245 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 113 ++----------
 src/include/access/heapam.h          |  37 ++--
 3 files changed, 205 insertions(+), 190 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fd5ff4e4e0a..04a0580e313 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
 	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -163,21 +182,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 
@@ -232,6 +236,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 
 /*
@@ -398,6 +403,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
+	prstate->new_vmbits = 0;
 	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
 												   prstate->block,
 												   &prstate->vmbuffer);
@@ -917,6 +923,42 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 	}
 }
 
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * set the VM for now.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * If the page is already all-frozen, or already all-visible and freezing
  * is not being attempted, there is no remaining work and we can bypass the
@@ -950,8 +992,6 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->old_vmbits = prstate->old_vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -986,7 +1026,8 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -1001,12 +1042,10 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also passed and if the page is
+ * found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1034,8 +1073,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1128,6 +1169,31 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	conflict_xid = InvalidTransactionId;
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1149,14 +1215,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1170,6 +1239,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1177,29 +1267,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1209,33 +1282,71 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible(). It's also a
+	 * valuable cross-check of the page state after pruning and freezing.
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.newest_live_xid);
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.set_all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->old_vmbits = prstate.old_vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->newly_all_visible = false;
+	presult->newly_all_frozen = false;
+	presult->newly_all_visible_frozen = false;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->newly_all_visible = true;
+			if (prstate.set_all_frozen)
+				presult->newly_all_visible_frozen = true;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->newly_all_frozen = true;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 696919e35dd..23deabd8c01 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -466,13 +466,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   bool allow_update_vistest,
@@ -2022,8 +2015,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2074,32 +2065,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2120,6 +2085,17 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	if (presult.newly_all_visible)
+		vacrel->new_all_visible_pages++;
+	if (presult.newly_all_visible_frozen)
+		vacrel->new_all_visible_all_frozen_pages++;
+	if (presult.newly_all_frozen)
+		vacrel->new_all_frozen_pages++;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.newly_all_visible_frozen ||
+		presult.newly_all_frozen;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2133,71 +2109,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.old_vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3613,7 +3524,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8815acccafb..e123dda090f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -265,7 +265,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -281,8 +282,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -316,26 +316,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * The value of the vmbuffer's vmbits at the beginning of pruning. It is
-	 * cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		old_vmbits;
+	bool		newly_all_visible;
+	bool		newly_all_visible_frozen;
+	bool		newly_all_frozen;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -472,6 +458,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v42-0006-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch (5.6K, 7-v42-0006-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch)
  download | inline diff:
From 45ceda895d959ec957b8dd99155abbd221c9d52d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v42 06/12] WAL log VM setting for empty pages in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

This has no independent benefit, but empty pages were the last user of
XLOG_HEAP2_VISIBLE, so if we make this change we can remove all of the
XLOH_HEAP2_VISIBLE code.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 29 +++++++++++-------
 src/backend/access/heap/vacuumlazy.c | 44 +++++++++++++++++-----------
 2 files changed, 45 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 04a0580e313..48d5d9fb906 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -2545,6 +2545,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	uint8		info;
 	uint8		regbuf_flags_heap;
 
+	Page		heap_page = BufferGetPage(buffer);
+
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
 	xlhp_freeze_plans freeze_plans;
@@ -2563,14 +2565,18 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	/*
 	 * We can avoid an FPI of the heap page if the only modification we are
 	 * making to it is to set PD_ALL_VISIBLE and checksums/wal_log_hints are
-	 * disabled. Note that if we explicitly skip an FPI, we must not stamp the
-	 * heap page with this record's LSN. Recovery skips records <= the stamped
-	 * LSN, so this could lead to skipping an earlier FPI needed to repair a
-	 * torn page.
+	 * disabled.
+	 *
+	 * However, if the page has never been WAL-logged (LSN is invalid), we
+	 * must force an FPI regardless.  This can happen when another backend
+	 * extends the heap, initializes the page, and then fails before WAL-
+	 * logging it.  Since heap extension is not WAL-logged, recovery might try
+	 * to replay our record and find that the page isn't initialized, which
+	 * would cause a PANIC.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!do_set_vm || !XLogHintBitIsNeeded()))
+	if (!XLogRecPtrIsValid(PageGetLSN(heap_page)))
+		regbuf_flags_heap |= REGBUF_FORCE_IMAGE;
+	else if (!do_prune && nfrozen == 0 && (!do_set_vm || !XLogHintBitIsNeeded()))
 		regbuf_flags_heap |= REGBUF_NO_IMAGE;
 
 	/*
@@ -2685,12 +2691,13 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 
 	/*
-	 * See comment at the top of the function about regbuf_flags_heap for
-	 * details on when we can advance the page LSN.
+	 * If we explicitly skip an FPI, we must not stamp the heap page with this
+	 * record's LSN. Recovery skips records <= the stamped LSN, so this could
+	 * lead to skipping an earlier FPI needed to repair a torn page.
 	 */
-	if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+	if (!(regbuf_flags_heap & REGBUF_NO_IMAGE))
 	{
 		Assert(BufferIsDirty(buffer));
-		PageSetLSN(BufferGetPage(buffer), recptr);
+		PageSetLSN(heap_page, recptr);
 	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 23deabd8c01..63e6199241c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1929,33 +1929,43 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
 			/* mark buffer dirty before writing a WAL record */
 			MarkBufferDirty(buf);
 
+			PageSetAllVisible(page);
+			PageClearPrunable(page);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
 			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
 			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				!XLogRecPtrIsValid(PageGetLSN(page)))
-				log_newpage_buffer(buf, true);
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
 
-			PageSetAllVisible(page);
-			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v42-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (27.5K, 8-v42-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 7061229f052018ecda5ccc31445509ccd5bbef2f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v42 07/12] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   5 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 150 +++++-----------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |   9 +-
 src/include/access/heapam_xlog.h         |  21 +--
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 63 insertions(+), 386 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 8a67bfa1aff..d64c403f2f0 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -55,9 +55,8 @@ mask_page_hint_bits(Page page)
 	PageClearHasFreeLinePointers(page);
 
 	/*
-	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * XXX: We should consider not masking PD_ALL_VISIBLE during WAL
+	 * consistency checking.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..044f385e477 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8886,50 +8886,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1da774c1536..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1369,9 +1232,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 48d5d9fb906..ea1afa5c58a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1256,8 +1256,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 63e6199241c..f698c2d899b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1939,11 +1939,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2821,9 +2821,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..21e89c38f0a 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -35,21 +34,31 @@
  * is set, we know the condition is true, but if a bit is not set, it might or
  * might not be true.
  *
- * Clearing visibility map bits is not separately WAL-logged.  The callers
- * must make sure that whenever a bit is cleared, the bit is cleared on WAL
- * replay of the updating operation as well.
- *
- * When we *set* a visibility map during VACUUM, we must write WAL.  This may
- * seem counterintuitive, since the bit is basically a hint: if it is clear,
- * it may still be the case that every tuple on the page is visible to all
- * transactions; we just don't know that for certain.  The difficulty is that
- * there are two bits which are typically set together: the PD_ALL_VISIBLE bit
- * on the page itself, and the visibility map bit.  If a crash occurs after the
- * visibility map page makes it to disk and before the updated heap page makes
- * it to disk, redo must set the bit on the heap page.  Otherwise, the next
- * insert, update, or delete on the heap page will fail to realize that the
- * visibility map bit must be cleared, possibly causing index-only scans to
- * return wrong answers.
+ * Changes to the visibility map bits are not separately WAL-logged. Callers
+ * must make sure that whenever a visibility map bit is cleared, the bit is
+ * cleared on WAL replay of the updating operation. And whenever a visibility
+ * map bit is set, the bit is set on WAL replay of the operation that rendered
+ * the page all-visible/all-frozen.
+ *
+ * The visibility map bits operate as a hint in one direction: if they are
+ * clear, it may still be the case that every tuple on the page is visible to
+ * all transactions (we just don't know that for certain). However, if they
+ * are set, we may skip vacuuming pages and incorrectly advance relfrozenxid
+ * or skip reading heap pages for an index-only scan and return wrong results.
+ *
+ * Additionally, it is critical that the heap-page level PD_ALL_VISIBLE bit be
+ * correctly set and cleared along with the VM bits.
+ *
+ * When clearing the VM, if a crash occurs after the heap page makes it to
+ * disk but before the VM page makes it to disk, replay must clear the VM or
+ * the next index-only scan can return wrong results or vacuum may incorrectly
+ * advance relfrozenxid.
+ *
+ * When setting the VM, if a crash occurs after the visibility map page makes
+ * it to disk and before the updated heap page makes it to disk, redo must set
+ * the bit on the heap page. Otherwise, the next insert, update, or delete on
+ * the heap page will fail to realize that the visibility map bit must be
+ * cleared, possibly causing index-only scans to return wrong answers.
  *
  * VACUUM will normally skip pages for which the visibility map bit is set;
  * such pages can't contain any dead tuples and therefore don't need vacuuming.
@@ -222,112 +231,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +251,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..de9092fdf5b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -476,10 +476,11 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	/*
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
-	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
+	 * This can happen whenever the changes in the WAL record do not affect
+	 * visibility on a standby. For example: a record that only freezes an
+	 * xmax from a locker.
+	 *
+	 * It's also quite common with records generated during index deletion
 	 * (original execution of the deletion can reason that a recovery conflict
 	 * which is sufficient for the deletion operation must take place before
 	 * replay of the deletion record itself).
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..516806fcca2 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,7 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
+/* 0x40 was XLOG_HEAP2_VISIBLE */
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +443,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +486,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c07c945f05..adc858c2a97 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4421,7 +4421,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v42-0008-Track-which-relations-are-modified-by-a-query.patch (8.7K, 9-v42-0008-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 40415fe2723303786248a1a5d53389c48216d6da Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v42 08/12] Track which relations are modified by a query

Save the relids of modified relations in a bitmap in the PlannedStmt.
A later commit will pass this information down to scan nodes to control
whether or not on-access pruning is allowed to set the visibility map.
Setting the visibility map during a scan is counterproductive if the
query is going to modify the page immediately after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this bitmap is only used as a hint
to avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c        | 47 ++++++++++++++++++++++++++
 src/backend/executor/execParallel.c    |  1 +
 src/backend/executor/nodeLockRows.c    |  4 +++
 src/backend/executor/nodeModifyTable.c | 18 ++++++++++
 src/backend/optimizer/plan/planner.c   | 21 +++++++++++-
 src/include/nodes/plannodes.h          | 10 ++++++
 6 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..3f134f9a34d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -90,6 +90,9 @@ static bool ExecCheckPermissionsModified(Oid relOid, Oid userid,
 										 Bitmapset *modifiedCols,
 										 AclMode requiredPerms);
 static void ExecCheckXactReadOnly(PlannedStmt *plannedstmt);
+#ifdef USE_ASSERT_CHECKING
+static void ExecCheckModifiedRelIds(EState *estate);
+#endif
 static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 static void ReportNotNullViolationError(ResultRelInfo *resultRelInfo,
 										TupleTableSlot *slot,
@@ -827,6 +830,46 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
 }
 
 
+/*
+ * ExecCheckModifiedRelIds
+ *		Verify that every relation the executor actually opened for modification
+ *		or row locking is present in the planner's modifiedRelids.
+ *
+ * The planner's set may be a superset of what the executor touches, because it
+ * includes partitions that were pruned at runtime and parent row marks that the
+ * executor skips.
+ */
+#ifdef USE_ASSERT_CHECKING
+static void
+ExecCheckModifiedRelIds(EState *estate)
+{
+	PlannedStmt *plannedstmt = estate->es_plannedstmt;
+	Bitmapset  *executor_relids = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = (ResultRelInfo *) lfirst(lc);
+
+		if (rri->ri_RangeTableIndex != 0)
+			executor_relids = bms_add_member(executor_relids,
+											 rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (int i = 0; i < estate->es_range_table_size; i++)
+		{
+			if (estate->es_rowmarks[i] != NULL)
+				executor_relids = bms_add_member(executor_relids,
+												 estate->es_rowmarks[i]->rti);
+		}
+	}
+	Assert(bms_is_subset(executor_relids, plannedstmt->modifiedRelids));
+	bms_free(executor_relids);
+}
+#endif
+
+
 /* ----------------------------------------------------------------
  *		InitPlan
  *
@@ -992,6 +1035,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	ExecCheckModifiedRelIds(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..4f39767d033 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->modifiedRelids = estate->es_plannedstmt->modifiedRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 8d865470780..d67f24fca8c 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -113,6 +113,10 @@ lnext:
 		}
 		erm->ermActive = true;
 
+		/* verify this relation is in the planner's modifiedRelids */
+		Assert(bms_is_member(erm->rti,
+							 estate->es_plannedstmt->modifiedRelids));
+
 		/* fetch the tuple's ctid */
 		datum = ExecGetJunkAttribute(slot,
 									 aerm->ctidAttNo,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..6b4ee4f9378 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -896,6 +896,16 @@ ExecInsert(ModifyTableContext *context,
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
+	/*
+	 * Verify this relation is in the planner's set of modified relations.
+	 * Partitions opened by tuple routing have ri_RangeTableIndex == 0 because
+	 * they have no range table entry, so we can only check relations that are
+	 * in the range table.
+	 */
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	/*
 	 * Open the table's indexes, if we have not done so already, so that we
 	 * can add new index entries for the inserted tuple.
@@ -1523,6 +1533,10 @@ ExecDeleteAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	EState	   *estate = context->estate;
 
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	return table_tuple_delete(resultRelInfo->ri_RelationDesc, tupleid,
 							  estate->es_output_cid,
 							  estate->es_snapshot,
@@ -2205,6 +2219,10 @@ ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	bool		partition_constraint_failed;
 	TM_Result	result;
 
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	updateCxt->crossPartUpdate = false;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..847af979e31 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	Bitmapset  *modifiedRelids = NULL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +663,23 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute modifiedRelids from result relations and row marks.  This is a
+	 * superset of what the executor will actually modify/lock at runtime,
+	 * because runtime partition pruning may eliminate some result relations,
+	 * and parent row marks are included here but skipped by the executor.
+	 */
+	foreach(lc, glob->resultRelations)
+		modifiedRelids = bms_add_member(modifiedRelids, lfirst_int(lc));
+	foreach(lc, glob->finalrowmarks)
+	{
+		PlanRowMark *rc = (PlanRowMark *) lfirst(lc);
+
+		modifiedRelids = bms_add_member(modifiedRelids, rc->rti);
+	}
+	result->modifiedRelids = modifiedRelids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..841c7707c59 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -112,6 +112,16 @@ typedef struct PlannedStmt
 	 */
 	Bitmapset  *unprunableRelids;
 
+	/*
+	 * RT indexes of relations modified by the query through
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by SELECT FOR UPDATE/SHARE.
+	 *
+	 * Computed by the planner, this is a superset of what the executor will
+	 * actually touch at runtime, because it includes partitions that may be
+	 * pruned and parent row marks that the executor skips.
+	 */
+	Bitmapset  *modifiedRelids;
+
 	/*
 	 * list of RTEPermissionInfo nodes for rtable entries needing one
 	 */
-- 
2.43.0



  [text/x-patch] v42-0009-Thread-flags-through-begin-scan-APIs.patch (32.9K, 10-v42-0009-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From f1c3a40ff3fa8b5f63073b13306082c880ef1c06 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v42 09/12] Thread flags through begin-scan APIs

Add a user-settable flags parameter to the table_beginscan_* wrappers,
index_beginscan(), table_index_fetch_begin(), and the table
AM callback index_fetch_begin(). This allows users to pass additional
context to be used when building the scan descriptors.

For index scans, a new uint32 flags field is added to
IndexFetchTableData, and the heap AM stores the caller-provided flags
there in heapam_index_fetch_begin().

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  9 ++-
 src/backend/access/index/genam.c          |  2 +
 src/backend/access/index/indexam.c        |  7 ++-
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 21 +++----
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +--
 src/backend/commands/typecmds.c           |  4 +-
 src/backend/executor/execIndexing.c       |  3 +-
 src/backend/executor/execReplication.c    | 12 ++--
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  5 +-
 src/backend/executor/nodeIndexscan.c      |  6 +-
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  6 +-
 src/backend/executor/nodeTidrangescan.c   |  6 +-
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  1 +
 src/include/access/genam.h                |  4 +-
 src/include/access/heapam.h               |  5 +-
 src/include/access/relscan.h              |  1 +
 src/include/access/tableam.h              | 72 +++++++++++++++--------
 26 files changed, 117 insertions(+), 75 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..0556e9f7b88 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..b25e814a996 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..555b16771e9 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 253a735b6c1..66726b22de6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -80,11 +80,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -762,7 +763,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									0,	/* flags */
+									SnapshotAny, NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -771,7 +774,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..b099d956e41 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,6 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
+										 0, /* flags */
 										 snapshot, NULL, nkeys, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
@@ -716,6 +717,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
+									 0, /* flags */
 									 snapshot, NULL, nkeys, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 5eb7e99ad3e..63d5daadca6 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -255,6 +255,7 @@ index_insert_cleanup(Relation indexRelation,
 IndexScanDesc
 index_beginscan(Relation heapRelation,
 				Relation indexRelation,
+				uint32 flags,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
 				int nkeys, int norderbys)
@@ -284,7 +285,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -593,7 +594,7 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -615,7 +616,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..016a5e546dd 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..7a12e808b07 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, 0);
 }
 
 
@@ -163,10 +163,10 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +176,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +185,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +207,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +216,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..29d7c3514b6 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1336,7 +1336,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 219f604df7b..ec9bbfe554a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,7 +13980,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22882,7 +22882,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23346,7 +23346,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..c46beedeb71 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,8 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, 0,	/* flags */
+								 &DirtySnapshot, NULL, indnkeyatts, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..23509771557 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,9 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   &snap, NULL, skey_attoff, 0);
 
 retry:
 	found = false;
@@ -383,7 +385,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +604,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +668,9 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   SnapshotAny, NULL, skey_attoff, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..324e2bed22c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c8db357e69f..decfd792809 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -92,6 +92,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -790,7 +791,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -856,7 +857,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index bd83e4712b3..a37fa9abece 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -110,6 +110,7 @@ IndexNext(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -206,6 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1726,7 +1728,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1790,7 +1792,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cc6b23abee0 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,7 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode, 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..c2d9b7293de 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..994f70989bc 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,7 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid, 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +460,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +494,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53f85ccde01..0528f8166d8 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7177,6 +7177,7 @@ get_actual_variable_endpoint(Relation heapRel,
 							  GlobalVisTestFor(heapRel));
 
 	index_scan = index_beginscan(heapRel, indexRel,
+								 0, /* flags */
 								 &SnapshotNonVacuumable, NULL,
 								 1, 0);
 	/* Set it up for index-only scan */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..b98c20a0edc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,6 +156,7 @@ extern void index_insert_cleanup(Relation indexRelation,
 
 extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
+									 uint32 flags,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
@@ -184,7 +185,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e123dda090f..c6aec63a505 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..80ea0b437d1 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,7 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..8357d05d83b 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,16 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table_beginscan_* functions
+ * and must not be passed by callers.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -420,7 +430,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +881,18 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -894,12 +910,13 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +945,7 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags, 0);
 }
 
 /*
@@ -939,11 +956,12 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -957,18 +975,19 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1000,7 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -994,7 +1013,7 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -1059,12 +1078,13 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1139,7 +1159,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1170,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1175,8 +1197,10 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1209,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v42-0010-Pass-down-information-on-table-modification-to-s.patch (11.3K, 11-v42-0010-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 6b34d8f1380b7ba224c6e240289ca93705005a66 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v42 10/12] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/nodeBitmapHeapscan.c |  6 +++++-
 src/backend/executor/nodeIndexonlyscan.c  | 15 ++++++++++++---
 src/backend/executor/nodeIndexscan.c      | 18 ++++++++++++++----
 src/backend/executor/nodeSamplescan.c     |  5 ++++-
 src/backend/executor/nodeSeqscan.c        | 18 +++++++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 15 ++++++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           | 10 ++++++++++
 8 files changed, 75 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 324e2bed22c..aec92c868ac 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -144,11 +144,15 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index decfd792809..b977719c295 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -85,6 +85,9 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -92,7 +95,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -791,7 +794,10 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -857,7 +863,10 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a37fa9abece..ad460c11679 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -104,13 +104,16 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -201,13 +204,16 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1728,7 +1734,9 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1792,7 +1800,9 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cc6b23abee0..71c70e5e5c7 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -292,13 +292,16 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&scanstate->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode, 0);
+									 scanstate->use_pagemode, flags);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index c2d9b7293de..79470e6b9b5 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,17 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -368,14 +372,18 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -405,8 +413,12 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 994f70989bc..4257afd96ed 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,10 +242,13 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+				SO_HINT_REL_READ_ONLY : 0;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid, 0);
+												&node->trss_maxtid, flags);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -452,15 +455,18 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
 
 /* ----------------------------------------------------------------
@@ -490,9 +496,12 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8357d05d83b..487e38292fa 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..31c4192b67e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,16 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+/*
+ * Return true if the scan node's relation is not modified by the query.
+ */
+static inline bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	return !bms_is_member(((Scan *) ss->ps.plan)->scanrelid,
+						  ss->ps.state->es_plannedstmt->modifiedRelids);
+}
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v42-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.1K, 12-v42-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From a6d391c12f03706e8d9feb07c7cd647d91594cf2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v42 11/12] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they require pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++--
 src/backend/access/heap/pruneheap.c      | 46 +++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              | 16 +++++++--
 5 files changed, 54 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 66726b22de6..651efa0127a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -148,7 +148,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2545,7 +2546,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ea1afa5c58a..c5647b1494b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -236,7 +238,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -257,7 +260,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -339,6 +343,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -395,6 +401,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -474,9 +481,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -930,21 +936,37 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1171,7 +1193,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f698c2d899b..24001b27387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c6aec63a505..90ca5a2cfa8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -95,7 +96,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
+	 */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
@@ -126,7 +132,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
@@ -431,7 +441,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v42-0012-Set-pd_prune_xid-on-insert.patch (8.8K, 13-v42-0012-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From cf680166b60099ca720fe70820034d3bf3837df9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v42 12/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to set the VM all-visible
after a page is filled with newly inserted tuples the first time it is
read. This means the page will get set all-visible when it is still in
shared buffers and avoid potential I/O amplification when vacuum later
has to scan the page and set it all-visible. It also enables index-only
scans of newly inserted data much sooner.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 40 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++------
 3 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..ba11bbc03a5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c5647b1494b..07e47d8927b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -279,7 +279,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1922,17 +1923,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-22 19:58  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-03-22 19:58 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Fri, Mar 20, 2026 at 7:37 PM Melanie Plageman
<[email protected]> wrote:
>
> I've made several minor updates and two notable updates in attached v42:
>
> - no separate log_newpage_buffer() for empty page vacuum.
> log_heap_prune_and_freeze() now handles pages without a valid LSN on
> its own
> - the heap_page_is_all_visible() assertion should be stable even once
> it uses GlobalVisState because I've updated the GloablVisState
> functions to avoid updating the GlobalVisState boundaries in this case

I've pushed the first two patches. Attached are the remaining 10. No
changes were made to those from the previous version.

- Melanie


Attachments:

  [text/x-patch] v43-0001-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (18.2K, 2-v43-0001-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 35251d668c2efdc82f6a40198272fa7ee5afe82a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v43 01/10] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility to
determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen
at the start of vacuum (OldestXmin). We now use GlobalVisState, which
also enables future work to set the VM during on-access pruning, since
ordinary queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page; if it is
globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid, which is required to set the visibility map
on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 30 ++++++++++-
 src/backend/access/heap/pruneheap.c         | 53 +++++++++---------
 src/backend/access/heap/vacuumlazy.c        | 60 ++++++++++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 19 ++++---
 src/include/access/heapam.h                 |  2 +
 src/include/utils/snapmgr.h                 |  4 +-
 7 files changed, 115 insertions(+), 55 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..c678f5a3c8f 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,31 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * If allow_update is false, the GlobalVisState boundaries will not be updated.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidConsideredRunning(GlobalVisState *state, TransactionId xid,
+								  bool allow_update)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid, allow_update);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
@@ -1354,7 +1379,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after, true))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1420,7 +1445,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
 	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+									   HeapTupleHeaderGetRawXmax(tuple),
+									   true);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b383b0fca8b..718f3a78c46 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -160,10 +160,13 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * is used after processing all tuples to determine if the page can be
+	 * considered all-visible (if the newest xmin is still considered running
+	 * by some snapshot, it cannot be). It is also used by the caller as the
+	 * conflict horizon when setting the VM bits, unless we froze all tuples
+	 * on the page (in which case the conflict xid was already included in the
+	 * WAL record).
 	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
@@ -281,7 +284,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid, true))
 		return;
 
 	/*
@@ -1081,6 +1084,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidConsideredRunning(prstate.vistest,
+										  prstate.visibility_cutoff_xid,
+										  true))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1283,7 +1298,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup)
 	 * if the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after, true))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1749,29 +1764,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1a446050d85..797973d7bd0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -468,13 +468,14 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
+										   bool allow_update_vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2089,7 +2090,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
@@ -2852,7 +2853,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest, true,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3614,14 +3615,19 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
-
+	/*
+	 * Pass allow_update_vistest as false so that the GlobalVisState
+	 * boundaries used here match those used by the pruning code we are
+	 * cross-checking. Allowing an update could move the boundaries between
+	 * the two calls, causing a spurious assertion failure.
+	 */
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest, false,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3642,7 +3648,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility. If allow_update_vistest is true,
+ * the boundaries of the GlobalVisState may be updated when checking the
+ * visibility of the newest live XID on the page.
  *
  * Output parameters:
  *
@@ -3661,7 +3669,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
+							   bool allow_update_vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3742,7 +3751,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3751,16 +3760,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3789,6 +3799,20 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidConsideredRunning(vistest, *visibility_cutoff_xid,
+										  allow_update_vistest))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 6b7117b56b2..c461f8dc02d 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisTestIsRemovableXid(vistest, dt->xid, true)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 0f913897acc..27e5adeebfb 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4223,11 +4223,17 @@ GlobalVisUpdate(void)
  * The state passed needs to have been initialized for the relation fxid is
  * from (NULL is also OK), otherwise the result may not be correct.
  *
+ * If allow_update is false, the GlobalVisState boundaries will not be updated
+ * even if it would otherwise be beneficial. This is useful for callers that
+ * do not want GlobalVisState to advance at all, for example because they need
+ * a conservative answer based on the current boundaries.
+ *
  * See comment for GlobalVisState for details.
  */
 bool
 GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+								FullTransactionId fxid,
+								bool allow_update)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4248,7 +4254,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 	 * might not exist a snapshot considering fxid running. If it makes sense,
 	 * update boundaries and recheck.
 	 */
-	if (GlobalVisTestShouldUpdate(state))
+	if (allow_update && GlobalVisTestShouldUpdate(state))
 	{
 		GlobalVisUpdate();
 
@@ -4268,7 +4274,8 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid,
+							bool allow_update)
 {
 	FullTransactionId fxid;
 
@@ -4282,7 +4289,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisTestIsRemovableFullXid(state, fxid, allow_update);
 }
 
 /*
@@ -4296,7 +4303,7 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisTestIsRemovableFullXid(state, fxid, true);
 }
 
 /*
@@ -4310,7 +4317,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisTestIsRemovableXid(state, xid, true);
 }
 
 /*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 305ecc31a9e..8815acccafb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -480,6 +480,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidConsideredRunning(GlobalVisState *state, TransactionId xid, bool allow_update);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 8c919d2640e..db903709c49 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -115,8 +115,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid, bool allow_update);
+extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid, bool allow_update);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v43-0002-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (15.4K, 3-v43-0002-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From 204a645f106b3e212cac17734b313675a6236bed Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v43 02/10] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 137 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++---
 2 files changed, 72 insertions(+), 95 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 718f3a78c46..cebd78603cb 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		old_vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -160,14 +163,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
-	 * is used after processing all tuples to determine if the page can be
-	 * considered all-visible (if the newest xmin is still considered running
-	 * by some snapshot, it cannot be). It is also used by the caller as the
-	 * conflict horizon when setting the VM bits, unless we froze all tuples
-	 * on the page (in which case the conflict xid was already included in the
-	 * WAL record).
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -177,7 +172,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /*
@@ -458,53 +452,42 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. We must initialize set_all_frozen based on whether or not the
+	 * caller passed HEAP_PAGE_PRUNE_FREEZE, because if they did not, we won't
+	 * call heap_prepare_freeze_tuple() for each tuple, and set_all_frozen
+	 * will never be cleared for tuples that need freezing.
+	 *
+	 * When freezing is not required (no XIDs/MXIDs older than the freeze
+	 * cutoff), we may still choose to "opportunistically" freeze if doing so
+	 * would make the page all-frozen.
+	 *
+	 * We will not be able to freeze the whole page at the end of vacuum if
+	 * there are tuples present that are not visible to everyone or if there
+	 * are dead tuples which will not be removable. However, dead tuples that
+	 * will be removed by the end of vacuum should not prevent this
+	 * opportunistic freezing.
+	 *
+	 * Therefore, we do not clear set_all_visible and set_all_frozen when we
+	 * encounter LP_DEAD items. Instead, we correct them after deciding
+	 * whether to freeze, but before updating the VM, to avoid setting the VM
+	 * bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze;
 }
 
 /*
@@ -734,7 +717,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -1012,9 +994,8 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1090,9 +1071,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidConsideredRunning(prstate.vistest,
-										  prstate.visibility_cutoff_xid,
+										  prstate.newest_live_xid,
 										  true))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
@@ -1244,7 +1225,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1705,6 +1686,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1752,32 +1734,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 797973d7bd0..696919e35dd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -470,7 +470,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -479,7 +479,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2829,7 +2829,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2855,14 +2855,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest, true,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2903,7 +2903,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3617,7 +3617,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 	/*
@@ -3630,7 +3630,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest, false,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3655,7 +3655,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3674,7 +3674,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3684,7 +3684,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3773,9 +3773,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3805,8 +3805,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidConsideredRunning(vistest, *visibility_cutoff_xid,
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidConsideredRunning(vistest, *newest_live_xid,
 										  allow_update_vistest))
 	{
 		all_visible = false;
-- 
2.43.0



  [text/x-patch] v43-0003-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch (23.1K, 4-v43-0003-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch)
  download | inline diff:
From 58b3187b1bb54585c5b81e261678fb448ad9cea0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v43 03/10] WAL log VM setting during vacuum phase I in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

For now, this change applies only to vacuum phase I, not to pruning
performed during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 245 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 113 ++----------
 src/include/access/heapam.h          |  37 ++--
 3 files changed, 205 insertions(+), 190 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index cebd78603cb..c43b192b163 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
 	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -157,21 +176,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /*
@@ -228,6 +232,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 
 /*
@@ -395,6 +400,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
+	prstate->new_vmbits = 0;
 	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
 												   prstate->block,
 												   &prstate->vmbuffer);
@@ -906,6 +912,42 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 	}
 }
 
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * set the VM for now.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * If the page is already all-frozen, or already all-visible and freezing
  * won't be attempted, there is no remaining work and we can use the fast path
@@ -939,8 +981,6 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->old_vmbits = prstate->old_vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -975,7 +1015,8 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -990,12 +1031,10 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also passed and if the page is
+ * found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1023,8 +1062,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1122,6 +1163,31 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	conflict_xid = InvalidTransactionId;
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1143,14 +1209,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1164,6 +1233,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1171,29 +1261,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1203,33 +1276,71 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible(). It's also a
+	 * valuable cross-check of the page state after pruning and freezing.
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.newest_live_xid);
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.set_all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->old_vmbits = prstate.old_vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->newly_all_visible = false;
+	presult->newly_all_frozen = false;
+	presult->newly_all_visible_frozen = false;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->newly_all_visible = true;
+			if (prstate.set_all_frozen)
+				presult->newly_all_visible_frozen = true;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->newly_all_frozen = true;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 696919e35dd..23deabd8c01 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -466,13 +466,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   bool allow_update_vistest,
@@ -2022,8 +2015,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2074,32 +2065,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2120,6 +2085,17 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	if (presult.newly_all_visible)
+		vacrel->new_all_visible_pages++;
+	if (presult.newly_all_visible_frozen)
+		vacrel->new_all_visible_all_frozen_pages++;
+	if (presult.newly_all_frozen)
+		vacrel->new_all_frozen_pages++;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.newly_all_visible_frozen ||
+		presult.newly_all_frozen;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2133,71 +2109,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.old_vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3613,7 +3524,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8815acccafb..e123dda090f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -265,7 +265,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -281,8 +282,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -316,26 +316,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * The value of the vmbuffer's vmbits at the beginning of pruning. It is
-	 * cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		old_vmbits;
+	bool		newly_all_visible;
+	bool		newly_all_visible_frozen;
+	bool		newly_all_frozen;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -472,6 +458,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v43-0004-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch (5.6K, 5-v43-0004-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch)
  download | inline diff:
From 7dc943a7141988e2568a73136cab96829ea0b625 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v43 04/10] WAL log VM setting for empty pages in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

This has no independent benefit, but empty pages were the last user of
XLOG_HEAP2_VISIBLE, so if we make this change we can remove all of the
XLOH_HEAP2_VISIBLE code.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 29 +++++++++++-------
 src/backend/access/heap/vacuumlazy.c | 44 +++++++++++++++++-----------
 2 files changed, 45 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c43b192b163..4f7220d17af 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -2539,6 +2539,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	uint8		info;
 	uint8		regbuf_flags_heap;
 
+	Page		heap_page = BufferGetPage(buffer);
+
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
 	xlhp_freeze_plans freeze_plans;
@@ -2557,14 +2559,18 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	/*
 	 * We can avoid an FPI of the heap page if the only modification we are
 	 * making to it is to set PD_ALL_VISIBLE and checksums/wal_log_hints are
-	 * disabled. Note that if we explicitly skip an FPI, we must not stamp the
-	 * heap page with this record's LSN. Recovery skips records <= the stamped
-	 * LSN, so this could lead to skipping an earlier FPI needed to repair a
-	 * torn page.
+	 * disabled.
+	 *
+	 * However, if the page has never been WAL-logged (LSN is invalid), we
+	 * must force an FPI regardless.  This can happen when another backend
+	 * extends the heap, initializes the page, and then fails before WAL-
+	 * logging it.  Since heap extension is not WAL-logged, recovery might try
+	 * to replay our record and find that the page isn't initialized, which
+	 * would cause a PANIC.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!do_set_vm || !XLogHintBitIsNeeded()))
+	if (!XLogRecPtrIsValid(PageGetLSN(heap_page)))
+		regbuf_flags_heap |= REGBUF_FORCE_IMAGE;
+	else if (!do_prune && nfrozen == 0 && (!do_set_vm || !XLogHintBitIsNeeded()))
 		regbuf_flags_heap |= REGBUF_NO_IMAGE;
 
 	/*
@@ -2679,12 +2685,13 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 
 	/*
-	 * See comment at the top of the function about regbuf_flags_heap for
-	 * details on when we can advance the page LSN.
+	 * If we explicitly skip an FPI, we must not stamp the heap page with this
+	 * record's LSN. Recovery skips records <= the stamped LSN, so this could
+	 * lead to skipping an earlier FPI needed to repair a torn page.
 	 */
-	if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+	if (!(regbuf_flags_heap & REGBUF_NO_IMAGE))
 	{
 		Assert(BufferIsDirty(buffer));
-		PageSetLSN(BufferGetPage(buffer), recptr);
+		PageSetLSN(heap_page, recptr);
 	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 23deabd8c01..63e6199241c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1929,33 +1929,43 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
 			/* mark buffer dirty before writing a WAL record */
 			MarkBufferDirty(buf);
 
+			PageSetAllVisible(page);
+			PageClearPrunable(page);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
 			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
 			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				!XLogRecPtrIsValid(PageGetLSN(page)))
-				log_newpage_buffer(buf, true);
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
 
-			PageSetAllVisible(page);
-			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v43-0005-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (27.5K, 6-v43-0005-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 2257e5018bdb68eda19ab05d0ea3689f7d94a6f9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v43 05/10] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   5 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 150 +++++-----------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |   9 +-
 src/include/access/heapam_xlog.h         |  21 +--
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 63 insertions(+), 386 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 8a67bfa1aff..d64c403f2f0 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -55,9 +55,8 @@ mask_page_hint_bits(Page page)
 	PageClearHasFreeLinePointers(page);
 
 	/*
-	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * XXX: We should consider not masking PD_ALL_VISIBLE during WAL
+	 * consistency checking.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..044f385e477 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8886,50 +8886,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1da774c1536..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1369,9 +1232,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4f7220d17af..41bfb6711c1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1250,8 +1250,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 63e6199241c..f698c2d899b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1939,11 +1939,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2821,9 +2821,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..21e89c38f0a 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -35,21 +34,31 @@
  * is set, we know the condition is true, but if a bit is not set, it might or
  * might not be true.
  *
- * Clearing visibility map bits is not separately WAL-logged.  The callers
- * must make sure that whenever a bit is cleared, the bit is cleared on WAL
- * replay of the updating operation as well.
- *
- * When we *set* a visibility map during VACUUM, we must write WAL.  This may
- * seem counterintuitive, since the bit is basically a hint: if it is clear,
- * it may still be the case that every tuple on the page is visible to all
- * transactions; we just don't know that for certain.  The difficulty is that
- * there are two bits which are typically set together: the PD_ALL_VISIBLE bit
- * on the page itself, and the visibility map bit.  If a crash occurs after the
- * visibility map page makes it to disk and before the updated heap page makes
- * it to disk, redo must set the bit on the heap page.  Otherwise, the next
- * insert, update, or delete on the heap page will fail to realize that the
- * visibility map bit must be cleared, possibly causing index-only scans to
- * return wrong answers.
+ * Changes to the visibility map bits are not separately WAL-logged. Callers
+ * must make sure that whenever a visibility map bit is cleared, the bit is
+ * cleared on WAL replay of the updating operation. And whenever a visibility
+ * map bit is set, the bit is set on WAL replay of the operation that rendered
+ * the page all-visible/all-frozen.
+ *
+ * The visibility map bits operate as a hint in one direction: if they are
+ * clear, it may still be the case that every tuple on the page is visible to
+ * all transactions (we just don't know that for certain). However, if they
+ * are set, we may skip vacuuming pages and incorrectly advance relfrozenxid
+ * or skip reading heap pages for an index-only scan and return wrong results.
+ *
+ * Additionally, it is critical that the heap-page level PD_ALL_VISIBLE bit be
+ * correctly set and cleared along with the VM bits.
+ *
+ * When clearing the VM, if a crash occurs after the heap page makes it to
+ * disk but before the VM page makes it to disk, replay must clear the VM or
+ * the next index-only scan can return wrong results or vacuum may incorrectly
+ * advance relfrozenxid.
+ *
+ * When setting the VM, if a crash occurs after the visibility map page makes
+ * it to disk and before the updated heap page makes it to disk, redo must set
+ * the bit on the heap page. Otherwise, the next insert, update, or delete on
+ * the heap page will fail to realize that the visibility map bit must be
+ * cleared, possibly causing index-only scans to return wrong answers.
  *
  * VACUUM will normally skip pages for which the visibility map bit is set;
  * such pages can't contain any dead tuples and therefore don't need vacuuming.
@@ -222,112 +231,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +251,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..de9092fdf5b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -476,10 +476,11 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	/*
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
-	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
+	 * This can happen whenever the changes in the WAL record do not affect
+	 * visibility on a standby. For example: a record that only freezes an
+	 * xmax from a locker.
+	 *
+	 * It's also quite common with records generated during index deletion
 	 * (original execution of the deletion can reason that a recovery conflict
 	 * which is sufficient for the deletion operation must take place before
 	 * replay of the deletion record itself).
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..516806fcca2 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,7 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
+/* 0x40 was XLOG_HEAP2_VISIBLE */
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +443,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +486,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c07c945f05..adc858c2a97 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4421,7 +4421,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v43-0006-Track-which-relations-are-modified-by-a-query.patch (8.7K, 7-v43-0006-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 636349f40265854b318ccf46700ec57731db8793 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v43 06/10] Track which relations are modified by a query

Save the relids of modified relations in a bitmap in the PlannedStmt.
A later commit will pass this information down to scan nodes to control
whether or not on-access pruning is allowed to set the visibility map.
Setting the visibility map during a scan is counterproductive if the
query is going to modify the page immediately after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this bitmap is only used as a hint
to avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c        | 47 ++++++++++++++++++++++++++
 src/backend/executor/execParallel.c    |  1 +
 src/backend/executor/nodeLockRows.c    |  4 +++
 src/backend/executor/nodeModifyTable.c | 18 ++++++++++
 src/backend/optimizer/plan/planner.c   | 21 +++++++++++-
 src/include/nodes/plannodes.h          | 10 ++++++
 6 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..3f134f9a34d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -90,6 +90,9 @@ static bool ExecCheckPermissionsModified(Oid relOid, Oid userid,
 										 Bitmapset *modifiedCols,
 										 AclMode requiredPerms);
 static void ExecCheckXactReadOnly(PlannedStmt *plannedstmt);
+#ifdef USE_ASSERT_CHECKING
+static void ExecCheckModifiedRelIds(EState *estate);
+#endif
 static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 static void ReportNotNullViolationError(ResultRelInfo *resultRelInfo,
 										TupleTableSlot *slot,
@@ -827,6 +830,46 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
 }
 
 
+/*
+ * ExecCheckModifiedRelIds
+ *		Verify that every relation the executor actually opened for modification
+ *		or row locking is present in the planner's modifiedRelids.
+ *
+ * The planner's set may be a superset of what the executor touches, because it
+ * includes partitions that were pruned at runtime and parent row marks that the
+ * executor skips.
+ */
+#ifdef USE_ASSERT_CHECKING
+static void
+ExecCheckModifiedRelIds(EState *estate)
+{
+	PlannedStmt *plannedstmt = estate->es_plannedstmt;
+	Bitmapset  *executor_relids = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = (ResultRelInfo *) lfirst(lc);
+
+		if (rri->ri_RangeTableIndex != 0)
+			executor_relids = bms_add_member(executor_relids,
+											 rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (int i = 0; i < estate->es_range_table_size; i++)
+		{
+			if (estate->es_rowmarks[i] != NULL)
+				executor_relids = bms_add_member(executor_relids,
+												 estate->es_rowmarks[i]->rti);
+		}
+	}
+	Assert(bms_is_subset(executor_relids, plannedstmt->modifiedRelids));
+	bms_free(executor_relids);
+}
+#endif
+
+
 /* ----------------------------------------------------------------
  *		InitPlan
  *
@@ -992,6 +1035,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	ExecCheckModifiedRelIds(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..4f39767d033 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->modifiedRelids = estate->es_plannedstmt->modifiedRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 8d865470780..d67f24fca8c 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -113,6 +113,10 @@ lnext:
 		}
 		erm->ermActive = true;
 
+		/* verify this relation is in the planner's modifiedRelids */
+		Assert(bms_is_member(erm->rti,
+							 estate->es_plannedstmt->modifiedRelids));
+
 		/* fetch the tuple's ctid */
 		datum = ExecGetJunkAttribute(slot,
 									 aerm->ctidAttNo,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..6b4ee4f9378 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -896,6 +896,16 @@ ExecInsert(ModifyTableContext *context,
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
+	/*
+	 * Verify this relation is in the planner's set of modified relations.
+	 * Partitions opened by tuple routing have ri_RangeTableIndex == 0 because
+	 * they have no range table entry, so we can only check relations that are
+	 * in the range table.
+	 */
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	/*
 	 * Open the table's indexes, if we have not done so already, so that we
 	 * can add new index entries for the inserted tuple.
@@ -1523,6 +1533,10 @@ ExecDeleteAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	EState	   *estate = context->estate;
 
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	return table_tuple_delete(resultRelInfo->ri_RelationDesc, tupleid,
 							  estate->es_output_cid,
 							  estate->es_snapshot,
@@ -2205,6 +2219,10 @@ ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	bool		partition_constraint_failed;
 	TM_Result	result;
 
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	updateCxt->crossPartUpdate = false;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..847af979e31 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	Bitmapset  *modifiedRelids = NULL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +663,23 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute modifiedRelids from result relations and row marks.  This is a
+	 * superset of what the executor will actually modify/lock at runtime,
+	 * because runtime partition pruning may eliminate some result relations,
+	 * and parent row marks are included here but skipped by the executor.
+	 */
+	foreach(lc, glob->resultRelations)
+		modifiedRelids = bms_add_member(modifiedRelids, lfirst_int(lc));
+	foreach(lc, glob->finalrowmarks)
+	{
+		PlanRowMark *rc = (PlanRowMark *) lfirst(lc);
+
+		modifiedRelids = bms_add_member(modifiedRelids, rc->rti);
+	}
+	result->modifiedRelids = modifiedRelids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..841c7707c59 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -112,6 +112,16 @@ typedef struct PlannedStmt
 	 */
 	Bitmapset  *unprunableRelids;
 
+	/*
+	 * RT indexes of relations modified by the query through
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by SELECT FOR UPDATE/SHARE.
+	 *
+	 * Computed by the planner, this is a superset of what the executor will
+	 * actually touch at runtime, because it includes partitions that may be
+	 * pruned and parent row marks that the executor skips.
+	 */
+	Bitmapset  *modifiedRelids;
+
 	/*
 	 * list of RTEPermissionInfo nodes for rtable entries needing one
 	 */
-- 
2.43.0



  [text/x-patch] v43-0007-Thread-flags-through-begin-scan-APIs.patch (32.9K, 8-v43-0007-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 9433b29071a93383251c37773b6d1b4a512f9565 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v43 07/10] Thread flags through begin-scan APIs

Add a user-settable flags parameter to the table_beginscan_* wrappers,
index_beginscan(), table_index_fetch_begin(), and the table
AM callback index_fetch_begin(). This allows users to pass additional
context to be used when building the scan descriptors.

For index scans, a new uint32 flags field is added to
IndexFetchTableData, and the heap AM stores the caller-provided flags
there in heapam_index_fetch_begin().

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  9 ++-
 src/backend/access/index/genam.c          |  2 +
 src/backend/access/index/indexam.c        |  7 ++-
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 21 +++----
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +--
 src/backend/commands/typecmds.c           |  4 +-
 src/backend/executor/execIndexing.c       |  3 +-
 src/backend/executor/execReplication.c    | 12 ++--
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  5 +-
 src/backend/executor/nodeIndexscan.c      |  6 +-
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  6 +-
 src/backend/executor/nodeTidrangescan.c   |  6 +-
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  1 +
 src/include/access/genam.h                |  4 +-
 src/include/access/heapam.h               |  5 +-
 src/include/access/relscan.h              |  1 +
 src/include/access/tableam.h              | 72 +++++++++++++++--------
 26 files changed, 117 insertions(+), 75 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..0556e9f7b88 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..b25e814a996 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..555b16771e9 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 253a735b6c1..66726b22de6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -80,11 +80,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -762,7 +763,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									0,	/* flags */
+									SnapshotAny, NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -771,7 +774,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..b099d956e41 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,6 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
+										 0, /* flags */
 										 snapshot, NULL, nkeys, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
@@ -716,6 +717,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
+									 0, /* flags */
 									 snapshot, NULL, nkeys, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 5eb7e99ad3e..63d5daadca6 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -255,6 +255,7 @@ index_insert_cleanup(Relation indexRelation,
 IndexScanDesc
 index_beginscan(Relation heapRelation,
 				Relation indexRelation,
+				uint32 flags,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
 				int nkeys, int norderbys)
@@ -284,7 +285,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -593,7 +594,7 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -615,7 +616,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..016a5e546dd 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..7a12e808b07 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, 0);
 }
 
 
@@ -163,10 +163,10 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +176,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +185,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +207,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +216,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..29d7c3514b6 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1336,7 +1336,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 219f604df7b..ec9bbfe554a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,7 +13980,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22882,7 +22882,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23346,7 +23346,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..c46beedeb71 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,8 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, 0,	/* flags */
+								 &DirtySnapshot, NULL, indnkeyatts, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..23509771557 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,9 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   &snap, NULL, skey_attoff, 0);
 
 retry:
 	found = false;
@@ -383,7 +385,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +604,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +668,9 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   SnapshotAny, NULL, skey_attoff, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..324e2bed22c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 9eab81fd1c8..f733be0220c 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -92,6 +92,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -794,7 +795,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -860,7 +861,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 06143e94c5a..1a101df492b 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -110,6 +110,7 @@ IndexNext(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -206,6 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1730,7 +1732,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1794,7 +1796,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cc6b23abee0 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,7 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode, 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..c2d9b7293de 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..994f70989bc 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,7 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid, 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +460,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +494,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53f85ccde01..0528f8166d8 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7177,6 +7177,7 @@ get_actual_variable_endpoint(Relation heapRel,
 							  GlobalVisTestFor(heapRel));
 
 	index_scan = index_beginscan(heapRel, indexRel,
+								 0, /* flags */
 								 &SnapshotNonVacuumable, NULL,
 								 1, 0);
 	/* Set it up for index-only scan */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..b98c20a0edc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,6 +156,7 @@ extern void index_insert_cleanup(Relation indexRelation,
 
 extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
+									 uint32 flags,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
@@ -184,7 +185,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e123dda090f..c6aec63a505 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..80ea0b437d1 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,7 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..8357d05d83b 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,16 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table_beginscan_* functions
+ * and must not be passed by callers.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -420,7 +430,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +881,18 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -894,12 +910,13 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +945,7 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags, 0);
 }
 
 /*
@@ -939,11 +956,12 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -957,18 +975,19 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1000,7 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -994,7 +1013,7 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -1059,12 +1078,13 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1139,7 +1159,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1170,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1175,8 +1197,10 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1209,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v43-0008-Pass-down-information-on-table-modification-to-s.patch (11.3K, 9-v43-0008-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 1648a5f09de8bacd005d383469436f21a73ced7d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v43 08/10] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/nodeBitmapHeapscan.c |  6 +++++-
 src/backend/executor/nodeIndexonlyscan.c  | 15 ++++++++++++---
 src/backend/executor/nodeIndexscan.c      | 18 ++++++++++++++----
 src/backend/executor/nodeSamplescan.c     |  5 ++++-
 src/backend/executor/nodeSeqscan.c        | 18 +++++++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 15 ++++++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           | 10 ++++++++++
 8 files changed, 75 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 324e2bed22c..aec92c868ac 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -144,11 +144,15 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f733be0220c..de9db45322c 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -85,6 +85,9 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -92,7 +95,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -795,7 +798,10 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -861,7 +867,10 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 1a101df492b..9df4a699504 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -104,13 +104,16 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -201,13 +204,16 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1732,7 +1738,9 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1796,7 +1804,9 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cc6b23abee0..71c70e5e5c7 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -292,13 +292,16 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&scanstate->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode, 0);
+									 scanstate->use_pagemode, flags);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index c2d9b7293de..79470e6b9b5 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,17 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -368,14 +372,18 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -405,8 +413,12 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 994f70989bc..4257afd96ed 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,10 +242,13 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+				SO_HINT_REL_READ_ONLY : 0;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid, 0);
+												&node->trss_maxtid, flags);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -452,15 +455,18 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
 
 /* ----------------------------------------------------------------
@@ -490,9 +496,12 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8357d05d83b..487e38292fa 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..31c4192b67e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,16 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+/*
+ * Return true if the scan node's relation is not modified by the query.
+ */
+static inline bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	return !bms_is_member(((Scan *) ss->ps.plan)->scanrelid,
+						  ss->ps.state->es_plannedstmt->modifiedRelids);
+}
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v43-0009-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.1K, 10-v43-0009-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From a8ec9732ff892dff8146a1d0e637dd30de2dcf53 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v43 09/10] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they require pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++--
 src/backend/access/heap/pruneheap.c      | 46 +++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              | 16 +++++++--
 5 files changed, 54 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 66726b22de6..651efa0127a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -148,7 +148,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2545,7 +2546,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 41bfb6711c1..235d21c1a41 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -232,7 +234,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -253,7 +256,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -336,6 +340,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -392,6 +398,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -461,9 +468,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -919,21 +925,37 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1165,7 +1187,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f698c2d899b..24001b27387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c6aec63a505..90ca5a2cfa8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -95,7 +96,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
+	 */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
@@ -126,7 +132,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
@@ -431,7 +441,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v43-0010-Set-pd_prune_xid-on-insert.patch (8.8K, 11-v43-0010-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 0d4edeee146d7b7f24efa1de4d7ded1e0f5c5111 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v43 10/10] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to set the VM all-visible
after a page is filled with newly inserted tuples the first time it is
read. This means the page will get set all-visible when it is still in
shared buffers and avoid potential I/O amplification when vacuum later
has to scan the page and set it all-visible. It also enables index-only
scans of newly inserted data much sooner.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 40 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++------
 3 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..ba11bbc03a5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 235d21c1a41..aa9221f5eb6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -275,7 +275,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1916,17 +1917,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-23 21:54  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 2 replies; 143+ messages in thread

From: Melanie Plageman @ 2026-03-23 21:54 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Sun, Mar 22, 2026 at 3:58 PM Melanie Plageman
<[email protected]> wrote:
>
> I've pushed the first two patches. Attached are the remaining 10. No
> changes were made to those from the previous version.

I'm planning on pushing 0001-0005 in the morning.

I've made some significant changes to 0006 and realized I need some
help. 0006 tracks what relations are modified by a query. This new
version (v44) uses relation oids instead of rt indexes to handle cases
where the same relation appears more than once in the range table
(e.g. INSERT INTO foo SELECT * FROM foo; foo appears twice). It
computes modifiedRelOids (a list of relation OIDs modified by the
query) in the planner and stores them in the PlannedStmt. There is one
big issue I'm not sure how to solve:

For queries like INSERT INTO ptable SELECT * FROM ptable, where ptable
is a partitioned table, though we scan ptable, we don't know when
executing that scan that we will then modify ptable with the insert.

In my patch, I've added find_all_inheritors() when populating
modifiedRelOids, but I realize this probably isn't acceptable to add
to planner from a performance perspective.

I'm looking for other ways to solve the problem. Now, for my use case
(setting the VM), we don't mind setting the VM during the table scan
part of the query. Whatever page gets the inserted tuple will clear
all-visible -- but that is just one page out of many. However, future
users of modifiedRelOids will likely expect it to contain all modified
relation oids.

I could also check when setting up the scan descriptor if the leaf
partition's parents (would have to check full ancestry) are in
modifiedRelOids. This also doesn't address the problem of future users
thinking modifiedRelOids is complete.

Note that it also means partitions that aren't modified will be
included in modifiedRelOids if one of the partitions is being
modified.

I could also just change the name of the modifiedRelOids to something
that doesn't make future users think it's exhaustive.

- Melanie


Attachments:

  [text/x-patch] v44-0001-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (18.4K, 2-v44-0001-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From cb89cc1c911f74f66d7febe69cbef95cef5c614e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v44 01/10] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility to
determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen
at the start of vacuum (OldestXmin). We now use GlobalVisState, which
also enables future work to set the VM during on-access pruning, since
ordinary queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page; if it is
globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid, which is required to set the visibility map
on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 31 ++++++++++-
 src/backend/access/heap/pruneheap.c         | 54 ++++++++++---------
 src/backend/access/heap/vacuumlazy.c        | 60 ++++++++++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 19 ++++---
 src/include/access/heapam.h                 |  4 ++
 src/include/utils/snapmgr.h                 |  8 ++-
 7 files changed, 123 insertions(+), 55 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..9a7bf331df7 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,32 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * If allow_update is true, the GlobalVisState boundaries may be updated. If
+ * it is false, they definitely will not be updated.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on
+ * the required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidConsideredRunning(GlobalVisState *state, TransactionId xid,
+								  bool allow_update)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid, allow_update);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
@@ -1354,7 +1380,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after, true))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1420,7 +1446,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
 	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+									   HeapTupleHeaderGetRawXmax(tuple),
+									   true);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b383b0fca8b..8eb3afda4bf 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -160,10 +160,13 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * is used after processing all tuples to determine if the page can be
+	 * considered all-visible (if the newest xmin is still considered running
+	 * by some snapshot, it cannot be). It is also used by the caller as the
+	 * conflict horizon when setting the VM bits, unless we froze all tuples
+	 * on the page (in which case the conflict xid was already included in the
+	 * WAL record).
 	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
@@ -281,7 +284,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid, true))
 		return;
 
 	/*
@@ -1081,6 +1084,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible. This should be done before determining whether or not
+	 * to opportunistically freeze.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidConsideredRunning(prstate.vistest,
+										  prstate.visibility_cutoff_xid,
+										  true))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1283,7 +1299,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup)
 	 * if the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after, true))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1749,29 +1765,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1a446050d85..797973d7bd0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -468,13 +468,14 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
+										   bool allow_update_vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2089,7 +2090,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
@@ -2852,7 +2853,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest, true,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3614,14 +3615,19 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
-
+	/*
+	 * Pass allow_update_vistest as false so that the GlobalVisState
+	 * boundaries used here match those used by the pruning code we are
+	 * cross-checking. Allowing an update could move the boundaries between
+	 * the two calls, causing a spurious assertion failure.
+	 */
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest, false,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3642,7 +3648,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility. If allow_update_vistest is true,
+ * the boundaries of the GlobalVisState may be updated when checking the
+ * visibility of the newest live XID on the page.
  *
  * Output parameters:
  *
@@ -3661,7 +3669,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
+							   bool allow_update_vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3742,7 +3751,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3751,16 +3760,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3789,6 +3799,20 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidConsideredRunning(vistest, *visibility_cutoff_xid,
+										  allow_update_vistest))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 6b7117b56b2..c461f8dc02d 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisTestIsRemovableXid(vistest, dt->xid, true)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 0f913897acc..27e5adeebfb 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4223,11 +4223,17 @@ GlobalVisUpdate(void)
  * The state passed needs to have been initialized for the relation fxid is
  * from (NULL is also OK), otherwise the result may not be correct.
  *
+ * If allow_update is false, the GlobalVisState boundaries will not be updated
+ * even if it would otherwise be beneficial. This is useful for callers that
+ * do not want GlobalVisState to advance at all, for example because they need
+ * a conservative answer based on the current boundaries.
+ *
  * See comment for GlobalVisState for details.
  */
 bool
 GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+								FullTransactionId fxid,
+								bool allow_update)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4248,7 +4254,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 	 * might not exist a snapshot considering fxid running. If it makes sense,
 	 * update boundaries and recheck.
 	 */
-	if (GlobalVisTestShouldUpdate(state))
+	if (allow_update && GlobalVisTestShouldUpdate(state))
 	{
 		GlobalVisUpdate();
 
@@ -4268,7 +4274,8 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid,
+							bool allow_update)
 {
 	FullTransactionId fxid;
 
@@ -4282,7 +4289,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisTestIsRemovableFullXid(state, fxid, allow_update);
 }
 
 /*
@@ -4296,7 +4303,7 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisTestIsRemovableFullXid(state, fxid, true);
 }
 
 /*
@@ -4310,7 +4317,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisTestIsRemovableXid(state, xid, true);
 }
 
 /*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 305ecc31a9e..ca5e8d1794f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -480,6 +480,10 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidConsideredRunning(GlobalVisState *state,
+											  TransactionId xid,
+											  bool allow_update);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 8c919d2640e..c7a869bc2b2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -115,8 +115,12 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state,
+										TransactionId xid,
+										bool allow_update);
+extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
+											FullTransactionId fxid,
+											bool allow_update);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v44-0002-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (15.6K, 3-v44-0002-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From 0968ef2bc8aabd448a3ce97365f74859d83cb68d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v44 02/10] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain set_all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 138 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++---
 2 files changed, 73 insertions(+), 95 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8eb3afda4bf..301fcfe7024 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		old_vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -160,14 +163,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
-	 * is used after processing all tuples to determine if the page can be
-	 * considered all-visible (if the newest xmin is still considered running
-	 * by some snapshot, it cannot be). It is also used by the caller as the
-	 * conflict horizon when setting the VM bits, unless we froze all tuples
-	 * on the page (in which case the conflict xid was already included in the
-	 * WAL record).
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -177,7 +172,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /*
@@ -458,53 +452,43 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. We must initialize set_all_frozen based on whether or not the
+	 * caller passed HEAP_PAGE_PRUNE_FREEZE, because if they did not, we won't
+	 * call heap_prepare_freeze_tuple() for each tuple, and set_all_frozen
+	 * will never be cleared for tuples that need freezing. This would lead to
+	 * incorrectly setting the visibility map all-frozen for this page.
+	 *
+	 * When freezing is not required (no XIDs/MXIDs older than the freeze
+	 * cutoff), we may still choose to "opportunistically" freeze if doing so
+	 * would make the page all-frozen.
+	 *
+	 * We will not be able to freeze the whole page at the end of vacuum if
+	 * there are tuples present that are not visible to everyone or if there
+	 * are dead tuples which will not be removable. However, dead tuples that
+	 * will be removed by the end of vacuum should not prevent this
+	 * opportunistic freezing.
+	 *
+	 * Therefore, we do not clear set_all_visible and set_all_frozen when we
+	 * encounter LP_DEAD items. Instead, we correct them after deciding
+	 * whether to freeze, but before updating the VM, to avoid setting the VM
+	 * bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze;
 }
 
 /*
@@ -734,7 +718,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -1012,9 +995,8 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'set_all_frozen' is always set to false when the
+ * HEAP_PAGE_PRUNE_FREEZE option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1091,9 +1073,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * to opportunistically freeze.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidConsideredRunning(prstate.vistest,
-										  prstate.visibility_cutoff_xid,
+										  prstate.newest_live_xid,
 										  true))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
@@ -1245,7 +1227,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1706,6 +1688,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1753,32 +1736,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 797973d7bd0..696919e35dd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -470,7 +470,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -479,7 +479,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2829,7 +2829,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2855,14 +2855,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest, true,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2903,7 +2903,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3617,7 +3617,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 	/*
@@ -3630,7 +3630,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest, false,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3655,7 +3655,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3674,7 +3674,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3684,7 +3684,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3773,9 +3773,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3805,8 +3805,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidConsideredRunning(vistest, *visibility_cutoff_xid,
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidConsideredRunning(vistest, *newest_live_xid,
 										  allow_update_vistest))
 	{
 		all_visible = false;
-- 
2.43.0



  [text/x-patch] v44-0003-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch (23.2K, 4-v44-0003-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch)
  download | inline diff:
From c440d93887ed97ee8ab42004da76417e29fa2a92 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v44 03/10] WAL log VM setting during vacuum phase I in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications. This reduces WAL volume produced by vacuum.

For now, this change applies only to vacuum phase I, not to pruning
performed during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Earlier version Reviewed-by: Robert Haas <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 245 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 113 ++----------
 src/include/access/heapam.h          |  37 ++--
 3 files changed, 205 insertions(+), 190 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 301fcfe7024..4d6d5e92773 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
 	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -157,21 +176,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /*
@@ -228,6 +232,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 
 /*
@@ -395,6 +400,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
+	prstate->new_vmbits = 0;
 	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
 												   prstate->block,
 												   &prstate->vmbuffer);
@@ -907,6 +913,42 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 	}
 }
 
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * set the VM on-access for now.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * If the page is already all-frozen, or already all-visible and freezing
  * won't be attempted, there is no remaining work and we can use the fast path
@@ -940,8 +982,6 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->old_vmbits = prstate->old_vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -976,7 +1016,8 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -991,12 +1032,10 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'set_all_frozen' is always set to false when the
- * HEAP_PAGE_PRUNE_FREEZE option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also passed and if the page is
+ * found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1024,8 +1063,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1124,6 +1165,31 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	conflict_xid = InvalidTransactionId;
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1145,14 +1211,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint. If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1166,6 +1235,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1173,29 +1263,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1205,33 +1278,71 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible(). It's also a
+	 * valuable cross-check of the page state after pruning and freezing.
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.newest_live_xid);
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.set_all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->old_vmbits = prstate.old_vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->newly_all_visible = false;
+	presult->newly_all_frozen = false;
+	presult->newly_all_visible_frozen = false;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->newly_all_visible = true;
+			if (prstate.set_all_frozen)
+				presult->newly_all_visible_frozen = true;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->newly_all_frozen = true;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 696919e35dd..23deabd8c01 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -466,13 +466,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   bool allow_update_vistest,
@@ -2022,8 +2015,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2074,32 +2065,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2120,6 +2085,17 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	if (presult.newly_all_visible)
+		vacrel->new_all_visible_pages++;
+	if (presult.newly_all_visible_frozen)
+		vacrel->new_all_visible_all_frozen_pages++;
+	if (presult.newly_all_frozen)
+		vacrel->new_all_frozen_pages++;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.newly_all_visible_frozen ||
+		presult.newly_all_frozen;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2133,71 +2109,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.old_vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3613,7 +3524,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ca5e8d1794f..0ab322bf58b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -265,7 +265,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -281,8 +282,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -316,26 +316,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * The value of the vmbuffer's vmbits at the beginning of pruning. It is
-	 * cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		old_vmbits;
+	bool		newly_all_visible;
+	bool		newly_all_visible_frozen;
+	bool		newly_all_frozen;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -472,6 +458,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *newest_live_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v44-0004-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch (5.7K, 5-v44-0004-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch)
  download | inline diff:
From 134e19504883dc3b07c506332dd23533e381b699 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v44 04/10] WAL log VM setting for empty pages in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

This has no independent benefit, but empty pages were the last user of
XLOG_HEAP2_VISIBLE, so by making this change we can next remove all of
the XLOG_HEAP2_VISIBLE code.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Earlier version Reviewed-by: Robert Haas <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 29 +++++++++++-------
 src/backend/access/heap/vacuumlazy.c | 44 +++++++++++++++++-----------
 2 files changed, 45 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4d6d5e92773..fe9564b26c7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -2541,6 +2541,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	uint8		info;
 	uint8		regbuf_flags_heap;
 
+	Page		heap_page = BufferGetPage(buffer);
+
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
 	xlhp_freeze_plans freeze_plans;
@@ -2559,14 +2561,18 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	/*
 	 * We can avoid an FPI of the heap page if the only modification we are
 	 * making to it is to set PD_ALL_VISIBLE and checksums/wal_log_hints are
-	 * disabled. Note that if we explicitly skip an FPI, we must not stamp the
-	 * heap page with this record's LSN. Recovery skips records <= the stamped
-	 * LSN, so this could lead to skipping an earlier FPI needed to repair a
-	 * torn page.
+	 * disabled.
+	 *
+	 * However, if the page has never been WAL-logged (LSN is invalid), we
+	 * must force an FPI regardless.  This can happen when another backend
+	 * extends the heap, initializes the page, and then fails before WAL-
+	 * logging it.  Since heap extension is not WAL-logged, recovery might try
+	 * to replay our record and find that the page isn't initialized, which
+	 * would cause a PANIC.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!do_set_vm || !XLogHintBitIsNeeded()))
+	if (!XLogRecPtrIsValid(PageGetLSN(heap_page)))
+		regbuf_flags_heap |= REGBUF_FORCE_IMAGE;
+	else if (!do_prune && nfrozen == 0 && (!do_set_vm || !XLogHintBitIsNeeded()))
 		regbuf_flags_heap |= REGBUF_NO_IMAGE;
 
 	/*
@@ -2681,12 +2687,13 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 
 	/*
-	 * See comment at the top of the function about regbuf_flags_heap for
-	 * details on when we can advance the page LSN.
+	 * If we explicitly skip an FPI, we must not stamp the heap page with this
+	 * record's LSN. Recovery skips records <= the stamped LSN, so this could
+	 * lead to skipping an earlier FPI needed to repair a torn page.
 	 */
-	if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+	if (!(regbuf_flags_heap & REGBUF_NO_IMAGE))
 	{
 		Assert(BufferIsDirty(buffer));
-		PageSetLSN(BufferGetPage(buffer), recptr);
+		PageSetLSN(heap_page, recptr);
 	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 23deabd8c01..63e6199241c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1929,33 +1929,43 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
 			/* mark buffer dirty before writing a WAL record */
 			MarkBufferDirty(buf);
 
+			PageSetAllVisible(page);
+			PageClearPrunable(page);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
 			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
 			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				!XLogRecPtrIsValid(PageGetLSN(page)))
-				log_newpage_buffer(buf, true);
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
 
-			PageSetAllVisible(page);
-			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v44-0005-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (27.6K, 6-v44-0005-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 42b15a654c87c36aadd1768f3f2fc915019ee44c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v44 05/10] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   5 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 151 +++++-----------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |   9 +-
 src/include/access/heapam_xlog.h         |  21 +--
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 64 insertions(+), 386 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 8a67bfa1aff..f32e3911a57 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -55,9 +55,8 @@ mask_page_hint_bits(Page page)
 	PageClearHasFreeLinePointers(page);
 
 	/*
-	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * PD_ALL_VISIBLE is masked during WAL consistency checking. It is worth
+	 * investigating if we could stop doing this.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..044f385e477 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8886,50 +8886,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1da774c1536..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1369,9 +1232,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fe9564b26c7..fc5345e1dff 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1252,8 +1252,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 63e6199241c..f698c2d899b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1939,11 +1939,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2821,9 +2821,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..4fd470702aa 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -35,21 +34,32 @@
  * is set, we know the condition is true, but if a bit is not set, it might or
  * might not be true.
  *
- * Clearing visibility map bits is not separately WAL-logged.  The callers
- * must make sure that whenever a bit is cleared, the bit is cleared on WAL
- * replay of the updating operation as well.
- *
- * When we *set* a visibility map during VACUUM, we must write WAL.  This may
- * seem counterintuitive, since the bit is basically a hint: if it is clear,
- * it may still be the case that every tuple on the page is visible to all
- * transactions; we just don't know that for certain.  The difficulty is that
- * there are two bits which are typically set together: the PD_ALL_VISIBLE bit
- * on the page itself, and the visibility map bit.  If a crash occurs after the
- * visibility map page makes it to disk and before the updated heap page makes
- * it to disk, redo must set the bit on the heap page.  Otherwise, the next
- * insert, update, or delete on the heap page will fail to realize that the
- * visibility map bit must be cleared, possibly causing index-only scans to
- * return wrong answers.
+ * Changes to the visibility map bits are not separately WAL-logged. Callers
+ * must make sure that whenever a visibility map bit is cleared, the bit is
+ * cleared on WAL replay of the updating operation. And whenever a visibility
+ * map bit is set, the bit is set on WAL replay of the operation that rendered
+ * the page all-visible/all-frozen.
+ *
+ * The visibility map bits operate as a hint in one direction: if they are
+ * clear, it may still be the case that every tuple on the page is visible to
+ * all transactions (we just don't know that for certain). However, if they
+ * are set, we may skip vacuuming pages and advance relfrozenxid or skip
+ * reading heap pages for an index-only scan. If they are incorrectly set,
+ * this can lead to data corruption and wrong results.
+ *
+ * Additionally, it is critical that the heap-page level PD_ALL_VISIBLE bit be
+ * correctly set and cleared along with the VM bits.
+ *
+ * When clearing the VM, if a crash occurs after the heap page makes it to
+ * disk but before the VM page makes it to disk, replay must clear the VM or
+ * the next index-only scan can return wrong results or vacuum may incorrectly
+ * advance relfrozenxid.
+ *
+ * When setting the VM, if a crash occurs after the visibility map page makes
+ * it to disk and before the updated heap page makes it to disk, redo must set
+ * the bit on the heap page. Otherwise, the next insert, update, or delete on
+ * the heap page will fail to realize that the visibility map bit must be
+ * cleared, possibly causing index-only scans to return wrong answers.
  *
  * VACUUM will normally skip pages for which the visibility map bit is set;
  * such pages can't contain any dead tuples and therefore don't need vacuuming.
@@ -222,112 +232,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +252,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..de9092fdf5b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -476,10 +476,11 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	/*
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
-	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
+	 * This can happen whenever the changes in the WAL record do not affect
+	 * visibility on a standby. For example: a record that only freezes an
+	 * xmax from a locker.
+	 *
+	 * It's also quite common with records generated during index deletion
 	 * (original execution of the deletion can reason that a recovery conflict
 	 * which is sufficient for the deletion operation must take place before
 	 * replay of the deletion record itself).
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..516806fcca2 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,7 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
+/* 0x40 was XLOG_HEAP2_VISIBLE */
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +443,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +486,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c07c945f05..adc858c2a97 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4421,7 +4421,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v44-0006-Track-which-relations-are-modified-by-a-query.patch (6.4K, 7-v44-0006-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 8f637aeb39efe65e629f616fbf4362ce9476ea1a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v44 06/10] Track which relations are modified by a query

Save the OIDs of modified relations in a list in the PlannedStmt. A
later commit will use this information during scans to control whether
or not on-access pruning is allowed to set the visibility map. Setting
the visibility map during a scan is counterproductive if the query is
going to modify the page immediately after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this list is only used as a hint
to avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execParallel.c    |  1 +
 src/backend/executor/nodeLockRows.c    |  3 ++
 src/backend/executor/nodeModifyTable.c |  9 ++++++
 src/backend/optimizer/plan/planner.c   | 44 +++++++++++++++++++++++++-
 src/include/nodes/plannodes.h          |  6 ++++
 5 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..5c1cf51d71c 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->modifiedRelOids = estate->es_plannedstmt->modifiedRelOids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 8d865470780..49b55d15e3e 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -113,6 +113,9 @@ lnext:
 		}
 		erm->ermActive = true;
 
+		Assert(list_member_oid(estate->es_plannedstmt->modifiedRelOids,
+							   RelationGetRelid(erm->relation)));
+
 		/* fetch the tuple's ctid */
 		datum = ExecGetJunkAttribute(slot,
 									 aerm->ctidAttNo,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..12ecdd383cc 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -896,6 +896,9 @@ ExecInsert(ModifyTableContext *context,
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
+	Assert(list_member_oid(estate->es_plannedstmt->modifiedRelOids,
+						   RelationGetRelid(resultRelationDesc)));
+
 	/*
 	 * Open the table's indexes, if we have not done so already, so that we
 	 * can add new index entries for the inserted tuple.
@@ -1523,6 +1526,9 @@ ExecDeleteAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	EState	   *estate = context->estate;
 
+	Assert(list_member_oid(estate->es_plannedstmt->modifiedRelOids,
+						   RelationGetRelid(resultRelInfo->ri_RelationDesc)));
+
 	return table_tuple_delete(resultRelInfo->ri_RelationDesc, tupleid,
 							  estate->es_output_cid,
 							  estate->es_snapshot,
@@ -2205,6 +2211,9 @@ ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	bool		partition_constraint_failed;
 	TM_Result	result;
 
+	Assert(list_member_oid(estate->es_plannedstmt->modifiedRelOids,
+						   RelationGetRelid(resultRelInfo->ri_RelationDesc)));
+
 	updateCxt->crossPartUpdate = false;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..039796773a9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	List	   *modifiedRelOids = NIL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +663,46 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute modifiedRelOids from result relations and row marks.
+	 *
+	 * This is a superset of what the executor will actually modify/lock at
+	 * runtime, because runtime partition pruning may eliminate some result
+	 * relations, and parent row marks are included here but skipped by the
+	 * executor.
+	 *
+	 * For partitioned tables, modifiedRelOids is expanded to include all
+	 * descendant partition OIDs. This is necessary because tuple routing
+	 * lazily expands leaf partitions at execution time.
+	 */
+	foreach(lc, glob->resultRelations)
+	{
+		Index		rti = lfirst_int(lc);
+		RangeTblEntry *rte = rt_fetch(rti, glob->finalrtable);
+
+		modifiedRelOids = list_append_unique_oid(modifiedRelOids, rte->relid);
+
+		if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+		{
+			List	   *children = find_all_inheritors(rte->relid,
+													   NoLock, NULL);
+			ListCell   *lc2;
+
+			foreach(lc2, children)
+				modifiedRelOids = list_append_unique_oid(modifiedRelOids,
+														 lfirst_oid(lc2));
+		}
+	}
+	foreach(lc, glob->finalrowmarks)
+	{
+		PlanRowMark *rc = (PlanRowMark *) lfirst(lc);
+		RangeTblEntry *rte = rt_fetch(rc->rti, glob->finalrtable);
+
+		modifiedRelOids = list_append_unique_oid(modifiedRelOids, rte->relid);
+	}
+	result->modifiedRelOids = modifiedRelOids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..6a7008cd50a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -112,6 +112,12 @@ typedef struct PlannedStmt
 	 */
 	Bitmapset  *unprunableRelids;
 
+	/*
+	 * OIDs of relations modified by the query through
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by SELECT FOR UPDATE/SHARE.
+	 */
+	List	   *modifiedRelOids;
+
 	/*
 	 * list of RTEPermissionInfo nodes for rtable entries needing one
 	 */
-- 
2.43.0



  [text/x-patch] v44-0007-Thread-flags-through-begin-scan-APIs.patch (32.9K, 8-v44-0007-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 134bdfa0fa25bb74c954c179c88d7cc38ca14c56 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v44 07/10] Thread flags through begin-scan APIs

Add a user-settable flags parameter to the table_beginscan_* wrappers,
index_beginscan(), table_index_fetch_begin(), and the table
AM callback index_fetch_begin(). This allows users to pass additional
context to be used when building the scan descriptors.

For index scans, a new uint32 flags field is added to
IndexFetchTableData, and the heap AM stores the caller-provided flags
there in heapam_index_fetch_begin().

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  9 ++-
 src/backend/access/index/genam.c          |  2 +
 src/backend/access/index/indexam.c        |  7 ++-
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 21 +++----
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +--
 src/backend/commands/typecmds.c           |  4 +-
 src/backend/executor/execIndexing.c       |  3 +-
 src/backend/executor/execReplication.c    | 12 ++--
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  5 +-
 src/backend/executor/nodeIndexscan.c      |  6 +-
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  6 +-
 src/backend/executor/nodeTidrangescan.c   |  6 +-
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  1 +
 src/include/access/genam.h                |  4 +-
 src/include/access/heapam.h               |  5 +-
 src/include/access/relscan.h              |  1 +
 src/include/access/tableam.h              | 72 +++++++++++++++--------
 26 files changed, 117 insertions(+), 75 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..0556e9f7b88 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..b25e814a996 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..555b16771e9 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 253a735b6c1..66726b22de6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -80,11 +80,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -762,7 +763,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									0,	/* flags */
+									SnapshotAny, NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -771,7 +774,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..b099d956e41 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,6 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
+										 0, /* flags */
 										 snapshot, NULL, nkeys, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
@@ -716,6 +717,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
+									 0, /* flags */
 									 snapshot, NULL, nkeys, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 5eb7e99ad3e..63d5daadca6 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -255,6 +255,7 @@ index_insert_cleanup(Relation indexRelation,
 IndexScanDesc
 index_beginscan(Relation heapRelation,
 				Relation indexRelation,
+				uint32 flags,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
 				int nkeys, int norderbys)
@@ -284,7 +285,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -593,7 +594,7 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -615,7 +616,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..016a5e546dd 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..7a12e808b07 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, 0);
 }
 
 
@@ -163,10 +163,10 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +176,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +185,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +207,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +216,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..29d7c3514b6 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1336,7 +1336,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 219f604df7b..ec9bbfe554a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,7 +13980,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22882,7 +22882,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23346,7 +23346,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..c46beedeb71 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,8 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, 0,	/* flags */
+								 &DirtySnapshot, NULL, indnkeyatts, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..23509771557 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,9 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   &snap, NULL, skey_attoff, 0);
 
 retry:
 	found = false;
@@ -383,7 +385,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +604,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +668,9 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   SnapshotAny, NULL, skey_attoff, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..324e2bed22c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 9eab81fd1c8..f733be0220c 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -92,6 +92,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -794,7 +795,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -860,7 +861,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 06143e94c5a..1a101df492b 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -110,6 +110,7 @@ IndexNext(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -206,6 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1730,7 +1732,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1794,7 +1796,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cc6b23abee0 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,7 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode, 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..c2d9b7293de 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..994f70989bc 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,7 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid, 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +460,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +494,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53f85ccde01..0528f8166d8 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7177,6 +7177,7 @@ get_actual_variable_endpoint(Relation heapRel,
 							  GlobalVisTestFor(heapRel));
 
 	index_scan = index_beginscan(heapRel, indexRel,
+								 0, /* flags */
 								 &SnapshotNonVacuumable, NULL,
 								 1, 0);
 	/* Set it up for index-only scan */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..b98c20a0edc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,6 +156,7 @@ extern void index_insert_cleanup(Relation indexRelation,
 
 extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
+									 uint32 flags,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
@@ -184,7 +185,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0ab322bf58b..47cbf2a20cf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..80ea0b437d1 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,7 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..8357d05d83b 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,16 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table_beginscan_* functions
+ * and must not be passed by callers.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -420,7 +430,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +881,18 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -894,12 +910,13 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +945,7 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags, 0);
 }
 
 /*
@@ -939,11 +956,12 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -957,18 +975,19 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1000,7 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -994,7 +1013,7 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -1059,12 +1078,13 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1139,7 +1159,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1170,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1175,8 +1197,10 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1209,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v44-0008-Pass-down-information-on-table-modification-to-s.patch (11.8K, 9-v44-0008-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 8bdd2f0a153abc23cc7c87eab00255b95a05446c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v44 08/10] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/execUtils.c          |  8 ++++++++
 src/backend/executor/nodeBitmapHeapscan.c |  6 +++++-
 src/backend/executor/nodeIndexonlyscan.c  | 15 ++++++++++++---
 src/backend/executor/nodeIndexscan.c      | 18 ++++++++++++++----
 src/backend/executor/nodeSamplescan.c     |  5 ++++-
 src/backend/executor/nodeSeqscan.c        | 18 +++++++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 15 ++++++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           |  2 ++
 9 files changed, 75 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9886ab06b69..ff05eca3a61 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -736,6 +736,14 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 	return list_member_int(estate->es_plannedstmt->resultRelations, scanrelid);
 }
 
+/* Return true if the scan node's relation is not modified by the query */
+bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	return !list_member_oid(ss->ps.state->es_plannedstmt->modifiedRelOids,
+							RelationGetRelid(ss->ss_currentRelation));
+}
+
 /* ----------------------------------------------------------------
  *		ExecOpenScanRelation
  *
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 324e2bed22c..aec92c868ac 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -144,11 +144,15 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f733be0220c..de9db45322c 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -85,6 +85,9 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -92,7 +95,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -795,7 +798,10 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -861,7 +867,10 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 1a101df492b..9df4a699504 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -104,13 +104,16 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -201,13 +204,16 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1732,7 +1738,9 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1796,7 +1804,9 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cc6b23abee0..71c70e5e5c7 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -292,13 +292,16 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&scanstate->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode, 0);
+									 scanstate->use_pagemode, flags);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index c2d9b7293de..79470e6b9b5 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,17 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -368,14 +372,18 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -405,8 +413,12 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 994f70989bc..4257afd96ed 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,10 +242,13 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+				SO_HINT_REL_READ_ONLY : 0;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid, 0);
+												&node->trss_maxtid, flags);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -452,15 +455,18 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
 
 /* ----------------------------------------------------------------
@@ -490,9 +496,12 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8357d05d83b..487e38292fa 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..7979a17e4ec 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,8 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+extern bool ScanRelIsReadOnly(ScanState *ss);
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v44-0009-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.1K, 10-v44-0009-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 273a8d32ec46ac37286b1952f164b6290cee6c66 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v44 09/10] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they require pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++--
 src/backend/access/heap/pruneheap.c      | 46 +++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              | 16 +++++++--
 5 files changed, 54 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 66726b22de6..651efa0127a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -148,7 +148,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2545,7 +2546,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fc5345e1dff..36260897503 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -232,7 +234,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -253,7 +256,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -336,6 +340,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -392,6 +398,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -461,9 +468,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -920,21 +926,37 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM on-access for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1167,7 +1189,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f698c2d899b..24001b27387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 47cbf2a20cf..bfc7d482827 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -95,7 +96,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
+	 */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
@@ -126,7 +132,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
@@ -431,7 +441,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v44-0010-Set-pd_prune_xid-on-insert.patch (8.8K, 11-v44-0010-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 05279abac3bf623887c1e4883d360116ad5538b0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v44 10/10] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to set the VM all-visible
after a page is filled with newly inserted tuples the first time it is
read. This means the page will get set all-visible when it is still in
shared buffers and avoid potential I/O amplification when vacuum later
has to scan the page and set it all-visible. It also enables index-only
scans of newly inserted data much sooner.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 40 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++------
 3 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..ba11bbc03a5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 36260897503..4e21c6f94ea 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -275,7 +275,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1918,17 +1919,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-24 06:53  Kirill Reshke <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 0 replies; 143+ messages in thread

From: Kirill Reshke @ 2026-03-24 06:53 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Tue, 24 Mar 2026 at 02:54, Melanie Plageman
<[email protected]> wrote:
>
> On Sun, Mar 22, 2026 at 3:58 PM Melanie Plageman
> <[email protected]> wrote:
> >
> > I've pushed the first two patches. Attached are the remaining 10. No
> > changes were made to those from the previous version.
>
> I'm planning on pushing 0001-0005 in the morning.
>

Thanks for taking care. I think it would be good to get WAL volume
reduction in v19 from 0004 & 0005. lgtm


-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-24 17:53  Andres Freund <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Andres Freund @ 2026-03-24 17:53 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hi,

On 2026-03-23 17:54:13 -0400, Melanie Plageman wrote:
> I've made some significant changes to 0006 and realized I need some
> help. 0006 tracks what relations are modified by a query. This new
> version (v44) uses relation oids instead of rt indexes to handle cases
> where the same relation appears more than once in the range table
> (e.g. INSERT INTO foo SELECT * FROM foo; foo appears twice). It
> computes modifiedRelOids (a list of relation OIDs modified by the
> query) in the planner and stores them in the PlannedStmt. There is one
> big issue I'm not sure how to solve:

I'm not entirely sure this is something we need to catch and therefore not
sure that modifiedRelOids is worth the trouble over just having the RT
indexes.


> For queries like INSERT INTO ptable SELECT * FROM ptable, where ptable
> is a partitioned table, though we scan ptable, we don't know when
> executing that scan that we will then modify ptable with the insert.

But does that matter? If such a query inserts a meaningful amount of rows it's
going to insert into different pages than the ones you selected from?


> In my patch, I've added find_all_inheritors() when populating
> modifiedRelOids, but I realize this probably isn't acceptable to add
> to planner from a performance perspective.

Agreed.


Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-24 23:44  Melanie Plageman <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-03-24 23:44 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Attached v45 is what remains in the patchset (I committed the rest).

On Tue, Mar 24, 2026 at 1:53 PM Andres Freund <[email protected]> wrote:
>
> On 2026-03-23 17:54:13 -0400, Melanie Plageman wrote:
> > I've made some significant changes to 0006 and realized I need some
> > help. 0006 tracks what relations are modified by a query. This new
> > version (v44) uses relation oids instead of rt indexes to handle cases
> > where the same relation appears more than once in the range table
> > (e.g. INSERT INTO foo SELECT * FROM foo; foo appears twice). It
> > computes modifiedRelOids (a list of relation OIDs modified by the
> > query) in the planner and stores them in the PlannedStmt. There is one
> > big issue I'm not sure how to solve:
>
> I'm not entirely sure this is something we need to catch and therefore not
> sure that modifiedRelOids is worth the trouble over just having the RT
> indexes.

Do you see the disadvantage of saving the oids as the space? I guess
it is also worse (from a semantic perspective) to use oids if the set
is incomplete—for example, because of the insert into leaf partition
case. If they are RT indexes, then it is accurate to say that it
includes all RT indexes for modified rels.

For INSERT INTO foo SELECT * FROM foo, if the pages are mostly full,
setting pages all-visible during the scan won't hurt because we will
insert at the end of the table. And if there is freespace throughout,
we won't do on-access pruning. So, I actually don't think we could end
up setting and unsetting the VM for every page.

In v45, I've gone back to RT indexes.

- Melanie


Attachments:

  [text/x-patch] v45-0001-Track-which-relations-are-modified-by-a-query.patch (5.9K, 2-v45-0001-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 67acc2e3ea3a3227e68a85c500ec8104a8c5b812 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v45 1/5] Track which relations are modified by a query

Save the range table indexes of modified relations in a Bitmapset in the
PlannedStmt. A later commit will use this information during scans to
control whether or not on-access pruning is allowed to set the
visibility map. Setting the visibility map during a scan is
counterproductive if the query is going to modify the page immediately
after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this set is only used as a hint to
avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execParallel.c    |  1 +
 src/backend/executor/nodeLockRows.c    |  3 +++
 src/backend/executor/nodeModifyTable.c | 14 ++++++++++++++
 src/backend/optimizer/plan/planner.c   | 21 ++++++++++++++++++++-
 src/include/nodes/plannodes.h          |  6 ++++++
 5 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..4f39767d033 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->modifiedRelids = estate->es_plannedstmt->modifiedRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 8d865470780..38a43315f11 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -113,6 +113,9 @@ lnext:
 		}
 		erm->ermActive = true;
 
+		Assert(bms_is_member(erm->rti,
+							 estate->es_plannedstmt->modifiedRelids));
+
 		/* fetch the tuple's ctid */
 		datum = ExecGetJunkAttribute(slot,
 									 aerm->ctidAttNo,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..4c64589b421 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -896,6 +896,14 @@ ExecInsert(ModifyTableContext *context,
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
+	/*
+	 * If this is a leaf partition we just found, it won't have a valid range
+	 * table index.
+	 */
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	/*
 	 * Open the table's indexes, if we have not done so already, so that we
 	 * can add new index entries for the inserted tuple.
@@ -1523,6 +1531,9 @@ ExecDeleteAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	EState	   *estate = context->estate;
 
+	Assert(bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	return table_tuple_delete(resultRelInfo->ri_RelationDesc, tupleid,
 							  estate->es_output_cid,
 							  estate->es_snapshot,
@@ -2205,6 +2216,9 @@ ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	bool		partition_constraint_failed;
 	TM_Result	result;
 
+	Assert(bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	updateCxt->crossPartUpdate = false;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..86dea1c9cb8 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	Bitmapset  *modifiedRelids = NULL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +663,23 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute modifiedRelids from result relations and row marks.
+	 *
+	 * This isn't exactly what the executor will actually modify/lock at
+	 * runtime. Runtime partition pruning may eliminate some result relations
+	 * and parent row marks included here may be skipped by the executor.
+	 * Conversely, leaf partitions whose result relations are created at the
+	 * time of insert are not included here.
+	 */
+	foreach(lc, glob->resultRelations)
+		modifiedRelids = bms_add_member(modifiedRelids, lfirst_int(lc));
+	foreach(lc, glob->finalrowmarks)
+		modifiedRelids = bms_add_member(modifiedRelids,
+										((PlanRowMark *) lfirst(lc))->rti);
+	result->modifiedRelids = modifiedRelids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..a9cf9dd0f29 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -112,6 +112,12 @@ typedef struct PlannedStmt
 	 */
 	Bitmapset  *unprunableRelids;
 
+	/*
+	 * RT indexes of relations modified by the query through
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by SELECT FOR UPDATE/SHARE.
+	 */
+	Bitmapset  *modifiedRelids;
+
 	/*
 	 * list of RTEPermissionInfo nodes for rtable entries needing one
 	 */
-- 
2.43.0



  [text/x-patch] v45-0002-Thread-flags-through-begin-scan-APIs.patch (32.8K, 3-v45-0002-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 84cfddea0ad7b2362f6c47ac68575f2d004edc55 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v45 2/5] Thread flags through begin-scan APIs

Add a user-settable flags parameter to the table_beginscan_* wrappers,
index_beginscan(), table_index_fetch_begin(), and the table
AM callback index_fetch_begin(). This allows users to pass additional
context to be used when building the scan descriptors.

For index scans, a new uint32 flags field is added to
IndexFetchTableData, and the heap AM stores the caller-provided flags
there in heapam_index_fetch_begin().

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  1 +
 src/backend/access/gin/gininsert.c        |  1 +
 src/backend/access/heap/heapam_handler.c  |  9 ++-
 src/backend/access/index/genam.c          |  2 +
 src/backend/access/index/indexam.c        |  6 +-
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 19 ++++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  9 ++-
 src/backend/commands/typecmds.c           |  4 +-
 src/backend/executor/execIndexing.c       |  3 +-
 src/backend/executor/execReplication.c    | 12 ++--
 src/backend/executor/nodeBitmapHeapscan.c |  1 +
 src/backend/executor/nodeIndexonlyscan.c  |  3 +
 src/backend/executor/nodeIndexscan.c      |  4 ++
 src/backend/executor/nodeSamplescan.c     |  1 +
 src/backend/executor/nodeSeqscan.c        |  6 +-
 src/backend/executor/nodeTidrangescan.c   |  5 +-
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  1 +
 src/include/access/genam.h                |  2 +
 src/include/access/heapam.h               |  5 +-
 src/include/access/relscan.h              |  1 +
 src/include/access/tableam.h              | 68 +++++++++++++++--------
 26 files changed, 111 insertions(+), 62 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..95fad61fa9e 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, 0, GetActiveSnapshot(), 0, NULL);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..79a79bea1c6 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,6 +2844,7 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
+									0,
 									ParallelTableScanFromBrinShared(brinshared));
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..32167d03137 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,6 +2068,7 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
+									0,
 									ParallelTableScanFromGinBuildShared(ginshared));
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d40878928e1..951273a4d7f 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -81,11 +81,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -763,7 +764,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									0,	/* flags */
+									SnapshotAny, NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -772,7 +775,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, 0, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..b099d956e41 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,6 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
+										 0, /* flags */
 										 snapshot, NULL, nkeys, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
@@ -716,6 +717,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
+									 0, /* flags */
 									 snapshot, NULL, nkeys, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index fbfc33159eb..ae754503007 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -256,6 +256,7 @@ index_insert_cleanup(Relation indexRelation,
 IndexScanDesc
 index_beginscan(Relation heapRelation,
 				Relation indexRelation,
+				uint32 flags,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
 				int nkeys, int norderbys)
@@ -285,7 +286,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -592,6 +593,7 @@ index_parallelrescan(IndexScanDesc scan)
  */
 IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
+						 uint32 flags,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
 						 ParallelIndexScanDesc pscan)
@@ -616,7 +618,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..98e9410c579 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									0, ParallelTableScanFromBTShared(btshared));
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..32bd3fdb7a5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, 0);
 }
 
 
@@ -163,10 +163,10 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, uint32 flags, ParallelTableScanDesc pscan)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +176,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +185,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
+								  uint32 flags,
 								  ParallelTableScanDesc pscan)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +207,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +216,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..390b4260ada 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1336,7 +1336,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, 0, GetActiveSnapshot(), 0, NULL);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c69c12dc014..14d808671c5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, 0, snapshot, 0, NULL);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,8 +13980,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
-
+	scan = table_beginscan(rel, 0, snapshot, 0, NULL);
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
 									  ALLOCSET_SMALL_SIZES);
@@ -22882,7 +22881,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, 0, snapshot, 0, NULL);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23346,7 +23345,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, 0, snapshot, 0, NULL);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..8c5d5e708a1 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, 0, snapshot, 0, NULL);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, 0, snapshot, 0, NULL);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..c46beedeb71 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,8 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, 0,	/* flags */
+								 &DirtySnapshot, NULL, indnkeyatts, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..3ef4d5d8bb2 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,9 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   &snap, NULL, skey_attoff, 0);
 
 retry:
 	found = false;
@@ -383,7 +385,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, 0, &snap, 0, NULL);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +604,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, 0, SnapshotAny, 0, NULL);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +668,9 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   SnapshotAny, NULL, skey_attoff, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..7e2c1b7467b 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -146,6 +146,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	{
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
+							   0,
 							   node->ss.ps.state->es_snapshot,
 							   0,
 							   NULL);
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 9eab81fd1c8..5cacb4b215a 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -92,6 +92,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -791,6 +792,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
+								 0, /* flags */
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
@@ -857,6 +859,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
+								 0, /* flags */
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 06143e94c5a..aaef31dbbad 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -110,6 +110,7 @@ IndexNext(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -206,6 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1727,6 +1729,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
+								 0, /* flags */
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
@@ -1791,6 +1794,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
+								 0, /* flags */
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cf4dd6a16b4 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -294,6 +294,7 @@ tablesample_init(SampleScanState *scanstate)
 	{
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
+									 0,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..376e877e87c 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -70,7 +70,7 @@ SeqNext(SeqScanState *node)
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
+								   0, estate->es_snapshot,
 								   0, NULL);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
@@ -375,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, 0, pscan);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, 0, pscan);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..bacd7aa5bc4 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -243,6 +243,7 @@ TidRangeNext(TidRangeScanState *node)
 		if (scandesc == NULL)
 		{
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
+												0,
 												estate->es_snapshot,
 												&node->trss_mintid,
 												&node->trss_maxtid);
@@ -460,7 +461,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  0, pscan);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +495,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  0, pscan);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..919df5eef0a 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, 0, snapshot, 0, NULL);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53f85ccde01..0528f8166d8 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7177,6 +7177,7 @@ get_actual_variable_endpoint(Relation heapRel,
 							  GlobalVisTestFor(heapRel));
 
 	index_scan = index_beginscan(heapRel, indexRel,
+								 0, /* flags */
 								 &SnapshotNonVacuumable, NULL,
 								 1, 0);
 	/* Set it up for index-only scan */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..24b2fda51df 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,6 +156,7 @@ extern void index_insert_cleanup(Relation indexRelation,
 
 extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
+									 uint32 flags,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
@@ -182,6 +183,7 @@ extern void index_parallelscan_initialize(Relation heapRelation,
 extern void index_parallelrescan(IndexScanDesc scan);
 extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
+											  uint32 flags,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9b403203006..e2e07348f37 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..80ea0b437d1 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,7 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..e1f90f2b6a7 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,16 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table_beginscan_* functions
+ * and must not be passed by callers.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -420,7 +430,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +881,18 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -893,13 +909,14 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  * `snapshot`, and if nkeys != 0, the results are filtered by those scan keys.
  */
 static inline TableScanDesc
-table_beginscan(Relation rel, Snapshot snapshot,
+table_beginscan(Relation rel, uint32 flags, Snapshot snapshot,
 				int nkeys, ScanKeyData *key)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +945,7 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags, 0);
 }
 
 /*
@@ -938,12 +955,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  * make it worth using the same data structure.
  */
 static inline TableScanDesc
-table_beginscan_bm(Relation rel, Snapshot snapshot,
+table_beginscan_bm(Relation rel, uint32 flags, Snapshot snapshot,
 				   int nkeys, ScanKeyData *key)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -954,21 +972,22 @@ table_beginscan_bm(Relation rel, Snapshot snapshot,
  * also allows control of whether page-mode visibility checking is used.
  */
 static inline TableScanDesc
-table_beginscan_sampling(Relation rel, Snapshot snapshot,
+table_beginscan_sampling(Relation rel, uint32 flags, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
 						 bool allow_pagemode)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1000,7 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -994,7 +1013,7 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -1057,14 +1076,15 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
  * for a TID range scan.
  */
 static inline TableScanDesc
-table_beginscan_tidrange(Relation rel, Snapshot snapshot,
+table_beginscan_tidrange(Relation rel, uint32 flags, Snapshot snapshot,
 						 ItemPointer mintid,
 						 ItemPointer maxtid)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1139,6 +1159,7 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
+											  uint32 flags,
 											  ParallelTableScanDesc pscan);
 
 /*
@@ -1149,6 +1170,7 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
+													   uint32 flags,
 													   ParallelTableScanDesc pscan);
 
 /*
@@ -1175,8 +1197,10 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1209,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v45-0003-Pass-down-information-on-table-modification-to-s.patch (11.6K, 4-v45-0003-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 7b400e9ccd7d8d83358bd503b7209c8ed1ec7ea3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v45 3/5] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/execUtils.c          |  8 ++++++++
 src/backend/executor/nodeBitmapHeapscan.c |  5 ++++-
 src/backend/executor/nodeIndexonlyscan.c  | 11 ++++++++---
 src/backend/executor/nodeIndexscan.c      | 16 ++++++++++++----
 src/backend/executor/nodeSamplescan.c     |  5 ++++-
 src/backend/executor/nodeSeqscan.c        | 14 +++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 15 ++++++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           |  2 ++
 9 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9886ab06b69..d2ffe28e010 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -736,6 +736,14 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 	return list_member_int(estate->es_plannedstmt->resultRelations, scanrelid);
 }
 
+/* Return true if the scan node's relation is not modified by the query */
+bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	return !bms_is_member(((Scan *) ss->ps.plan)->scanrelid,
+						  ss->ps.state->es_plannedstmt->modifiedRelids);
+}
+
 /* ----------------------------------------------------------------
  *		ExecOpenScanRelation
  *
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7e2c1b7467b..dba6c31d188 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -144,9 +144,12 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
-							   0,
+							   flags,
 							   node->ss.ps.state->es_snapshot,
 							   0,
 							   NULL);
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 5cacb4b215a..88491249a9a 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -85,6 +85,9 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -92,7 +95,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -792,7 +795,8 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
-								 0, /* flags */
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
@@ -859,7 +863,8 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
-								 0, /* flags */
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index aaef31dbbad..16ec455a964 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -104,13 +104,16 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -201,13 +204,16 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1729,7 +1735,8 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
-								 0, /* flags */
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
@@ -1794,7 +1801,8 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
-								 0, /* flags */
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cf4dd6a16b4..b6a02072da5 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -292,9 +292,12 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&scanstate->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
-									 0,
+									 flags,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 376e877e87c..2d0993a83f4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,12 +65,15 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   0, estate->es_snapshot,
+								   flags, estate->es_snapshot,
 								   0, NULL);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
@@ -368,14 +371,17 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, 0, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, flags, pscan);
 }
 
 /* ----------------------------------------------------------------
@@ -405,8 +411,10 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, 0, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, flags, pscan);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index bacd7aa5bc4..05ed5364238 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,8 +242,11 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+				SO_HINT_REL_READ_ONLY : 0;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
-												0,
+												flags,
 												estate->es_snapshot,
 												&node->trss_mintid,
 												&node->trss_maxtid);
@@ -453,15 +456,18 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  0, pscan);
+										  flags, pscan);
 }
 
 /* ----------------------------------------------------------------
@@ -491,9 +497,12 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  0, pscan);
+										  flags, pscan);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e1f90f2b6a7..a8fd8f0d45c 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..7979a17e4ec 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,8 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+extern bool ScanRelIsReadOnly(ScanState *ss);
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v45-0004-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.1K, 5-v45-0004-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 902a8522f213ffd0b5aae486740da3d6141c98b3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v45 4/5] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they require pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++--
 src/backend/access/heap/pruneheap.c      | 46 +++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              | 16 +++++++--
 5 files changed, 54 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 951273a4d7f..5c2faaf2340 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -149,7 +149,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2546,7 +2547,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6693af8da7f..d83fd26b274 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -232,7 +234,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -253,7 +256,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -336,6 +340,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -392,6 +398,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -461,9 +468,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -920,21 +926,37 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM on-access for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1167,7 +1189,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f698c2d899b..24001b27387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e2e07348f37..2fc4462050a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -95,7 +96,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
+	 */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
@@ -126,7 +132,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
@@ -431,7 +441,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v45-0005-Set-pd_prune_xid-on-insert.patch (8.8K, 6-v45-0005-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 7633073a6866d4cb94c8722a547ba49e68950bb0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v45 5/5] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to set the VM all-visible
after a page is filled with newly inserted tuples the first time it is
read. This means the page will get set all-visible when it is still in
shared buffers and avoid potential I/O amplification when vacuum later
has to scan the page and set it all-visible. It also enables index-only
scans of newly inserted data much sooner.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 39 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++-------
 3 files changed, 51 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..cdaf57e3f12 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM and we're not in bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4152,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d83fd26b274..bb364f53a44 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -275,7 +275,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1918,17 +1919,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-25 18:54  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 2 replies; 143+ messages in thread

From: Melanie Plageman @ 2026-03-25 18:54 UTC (permalink / raw)
  To: Tomas Vondra <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, Mar 25, 2026 at 2:02 PM Tomas Vondra <[email protected]> wrote:
>
> 0002
>
> - Don't we usually keep "flags" as the last parameter? It seems a bit
> weird that it's added in between relation and snapshot.

In an earlier review, Andres said he disliked using flags as the last
parameter for index_beginscan() because its current last two
parameters are integers (nkeys and norderbys), which could be
confusing. Personally, I think you have to look at the function
signature before just randomly passing stuff, and so it shouldn't
matter -- but I didn't care enough to argue. If you agree with me that
they should be last, then it's two against one and I'll change it back
:) I can keep the callsite comments naming the flags parameter.

> - Do we really want to pass two sets of flags to table_beginscan_common?
>  I realize it's done to ensure "users" don't use internal flags, but
> then maybe it'd be better to do that check in the places calling the
> _common? Someone adding a new caller can break this in various ways
> anyway, e.g. by setting bits in the internal flags, no?

Yes, callers of table_beginscan_common() could pass flags they
shouldn't in internal_flags. But I was mostly trying to prevent the
case where a user picks a flag that overlaps with an internal flag,
conditionally passes it as a user flag, and then when they test for it
in their AM-specific code, they aren't actually checking if their own
flag is set.

Anyway, it's not hard to move:
    Assert((flags & SO_INTERNAL_FLAGS) == 0);
into the table_beginscan_common() callers and then pass the internal
flags the caller wants to pass + the user specified flags to
table_beginscan_common(). And I think that fixes what you are talking
about?

> If we want to have these checks, should we be more thorough? Should we
> check the internal flags only set internal flags?

That's easy enough too.
Assert((internal_flags & ~SO_INTERNAL_FLAGS) == 0); I think does the trick.

I think this would largely be the same as having
table_beginscan_common() callers validate that the user-passed flags
are not internal and then OR them together with the internal flags
they want to pass to table_beginscan_common().

I'm trying to think of cases where the two approaches would differ so
I can decide which to do.

> 0003
>
> - Half the "beginscan" calls use a ternary operator directly, half sets
> a variable first (and then uses that). Often mixed in the same file.
> Shouldn't it be a bit consistent?

Indeed.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-25 23:14  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-03-25 23:14 UTC (permalink / raw)
  To: Tomas Vondra <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, Mar 25, 2026 at 2:54 PM Melanie Plageman
<[email protected]> wrote:
>
> I'm trying to think of cases where the two approaches would differ so
> I can decide which to do.
>
> > 0003
> >
> > - Half the "beginscan" calls use a ternary operator directly, half sets
> > a variable first (and then uses that). Often mixed in the same file.
> > Shouldn't it be a bit consistent?
>
> Indeed.

Attached v46 addresses your feedback and has a bit of assorted cleanup in it.

I started wondering if table_beginscan_strat() is a bit weird now
because it has two boolean arguments that are basically just
SO_ALLOW_STRAT and SO_ALLOW_SYNC -- so those are kind of letting the
user set "internal" flags. Anyway, I'm not sure we should do anything
about it, but it got me thinking.

- Melanie


Attachments:

  [text/x-patch] v46-0001-Track-which-relations-are-modified-by-a-query.patch (6.3K, 2-v46-0001-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 4216d588438aacd4023801869edc464dc2cb0921 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v46 1/5] Track which relations are modified by a query

Save the range table indexes of relations modified by a query in a
bitmap in the PlannedStmt. This is derived from existing PlannedStmt
members listing row marks and result relations, but precomputing it
allows cheap membership checks during execution.

A later commit will use this information during scans to control whether
or not on-access pruning is allowed to set the visibility map -- which
would be counterproductive if the query will modify the page.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE and non-locking marks like ROW_MARK_REFERENCE). Since
this bitmap is used to avoid unnecessary work, it is okay for it to be
conservative.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execParallel.c    |  1 +
 src/backend/executor/nodeLockRows.c    |  3 +++
 src/backend/executor/nodeModifyTable.c | 20 ++++++++++++++++++++
 src/backend/optimizer/plan/planner.c   | 21 ++++++++++++++++++++-
 src/include/nodes/plannodes.h          |  6 ++++++
 5 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..4f39767d033 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->modifiedRelids = estate->es_plannedstmt->modifiedRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 8d865470780..38a43315f11 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -113,6 +113,9 @@ lnext:
 		}
 		erm->ermActive = true;
 
+		Assert(bms_is_member(erm->rti,
+							 estate->es_plannedstmt->modifiedRelids));
+
 		/* fetch the tuple's ctid */
 		datum = ExecGetJunkAttribute(slot,
 									 aerm->ctidAttNo,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..b22264c343b 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -896,6 +896,14 @@ ExecInsert(ModifyTableContext *context,
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
+	/*
+	 * If this is a leaf partition we just found, it won't have a valid range
+	 * table index.
+	 */
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	/*
 	 * Open the table's indexes, if we have not done so already, so that we
 	 * can add new index entries for the inserted tuple.
@@ -1523,6 +1531,9 @@ ExecDeleteAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	EState	   *estate = context->estate;
 
+	Assert(bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	return table_tuple_delete(resultRelInfo->ri_RelationDesc, tupleid,
 							  estate->es_output_cid,
 							  estate->es_snapshot,
@@ -2205,6 +2216,15 @@ ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	bool		partition_constraint_failed;
 	TM_Result	result;
 
+	/*
+	 * Tuple routing for cross-partition updates or ON CONFLICT ... DO UPDATE
+	 * may open leaf partitions not in the range table, in which case
+	 * ri_RangeTableIndex is 0.
+	 */
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	updateCxt->crossPartUpdate = false;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..de10b6fb413 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	Bitmapset  *modifiedRelids = NULL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +663,23 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute modifiedRelids from result relations and row marks.
+	 *
+	 * This isn't exactly what the executor will actually modify/lock at
+	 * runtime. Runtime partition pruning may eliminate some result relations
+	 * and some rowmarks are included that may not result in table
+	 * modification. Conversely, leaf partitions whose result relations are
+	 * created at the time of insert are not included here.
+	 */
+	foreach(lc, glob->resultRelations)
+		modifiedRelids = bms_add_member(modifiedRelids, lfirst_int(lc));
+	foreach(lc, glob->finalrowmarks)
+		modifiedRelids = bms_add_member(modifiedRelids,
+										((PlanRowMark *) lfirst(lc))->rti);
+	result->modifiedRelids = modifiedRelids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..a9cf9dd0f29 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -112,6 +112,12 @@ typedef struct PlannedStmt
 	 */
 	Bitmapset  *unprunableRelids;
 
+	/*
+	 * RT indexes of relations modified by the query through
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by SELECT FOR UPDATE/SHARE.
+	 */
+	Bitmapset  *modifiedRelids;
+
 	/*
 	 * list of RTEPermissionInfo nodes for rtable entries needing one
 	 */
-- 
2.43.0



  [text/x-patch] v46-0002-Thread-flags-through-begin-scan-APIs.patch (34.0K, 3-v46-0002-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 384753f272cbe60e77299b97153801d3a448d33b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v46 2/5] Thread flags through begin-scan APIs

Add a user-settable flags parameter to the table_beginscan_* wrappers,
index_beginscan(), table_index_fetch_begin(), and the table
AM callback index_fetch_begin(). This allows users to pass additional
context to be used when building the scan descriptors.

For index scans, a new flags field is added to IndexFetchTableData, and
the heap AM stores the caller-provided flags there in
heapam_index_fetch_begin().

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  3 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  9 ++-
 src/backend/access/index/genam.c          |  6 +-
 src/backend/access/index/indexam.c        | 10 ++--
 src/backend/access/nbtree/nbtsort.c       |  3 +-
 src/backend/access/table/tableam.c        | 22 +++----
 src/backend/commands/constraint.c         |  3 +-
 src/backend/commands/copyto.c             |  3 +-
 src/backend/commands/tablecmds.c          | 13 ++--
 src/backend/commands/typecmds.c           |  6 +-
 src/backend/executor/execIndexing.c       |  4 +-
 src/backend/executor/execReplication.c    | 14 +++--
 src/backend/executor/nodeBitmapHeapscan.c |  3 +-
 src/backend/executor/nodeIndexonlyscan.c  |  9 ++-
 src/backend/executor/nodeIndexscan.c      | 12 ++--
 src/backend/executor/nodeSamplescan.c     |  3 +-
 src/backend/executor/nodeSeqscan.c        |  9 ++-
 src/backend/executor/nodeTidrangescan.c   |  7 ++-
 src/backend/partitioning/partbounds.c     |  3 +-
 src/backend/utils/adt/selfuncs.c          |  3 +-
 src/include/access/genam.h                |  6 +-
 src/include/access/heapam.h               |  5 +-
 src/include/access/relscan.h              |  1 +
 src/include/access/tableam.h              | 73 +++++++++++++++--------
 26 files changed, 152 insertions(+), 84 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..75ad379190f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,8 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL,
+						   0 /* flags */ );
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..536493fa38a 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0 /* flags */ );
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..d4e9c9ed950 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0 /* flags */ );
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d40878928e1..8c7695ebfb9 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -81,11 +81,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -763,7 +764,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0,
+									0 /* flags */ );
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -772,7 +774,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL,
+									0 /* flags */ );
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..03a243345bc 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,8 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0,
+										 0 /* flags */ );
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +717,8 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0,
+									 0 /* flags */ );
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index fbfc33159eb..13cdbb86cd7 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -258,7 +258,8 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys,
+				uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -285,7 +286,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -594,7 +595,8 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan,
+						 uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -616,7 +618,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..3c444ece216 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,8 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared),
+									0 /* flags */ );
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..3ac4027ce11 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, 0);
 }
 
 
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan,
+						 uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +177,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +186,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +208,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +217,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +250,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0 /* flags */ );
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..1c4f5a25ba4 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,8 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation,
+															0 /* flags */ );
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..e6c237d6d0f 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1336,7 +1336,8 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL,
+							   0 /* flags */ );
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c69c12dc014..6dd3aed6b98 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,8 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL,
+							   0 /* flags */ );
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,8 +13981,8 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
-
+	scan = table_beginscan(rel, snapshot, 0, NULL,
+						   0 /* flags */ );
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
 									  ALLOCSET_SMALL_SIZES);
@@ -22882,7 +22883,8 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL,
+							   0 /* flags */ );
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23346,7 +23348,8 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL,
+						   0 /* flags */ );
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..115bd77af27 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,8 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL,
+							   0 /* flags */ );
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3267,8 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL,
+							   0 /* flags */ );
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..72671013c52 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,9 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index,
+								 &DirtySnapshot, NULL, indnkeyatts, 0,
+								 0 /* flags */ );
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..ca0d1cc6b95 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,9 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   &snap, NULL, skey_attoff, 0,
+						   0 /* flags */ );
 
 retry:
 	found = false;
@@ -383,7 +385,8 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL,
+						   0 /* flags */ );
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +605,8 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL,
+						   0 /* flags */ );
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +670,9 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   SnapshotAny, NULL, skey_attoff, 0,
+						   0 /* flags */ );
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..e58bb02db43 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,8 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   0 /* flags */ );
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 9eab81fd1c8..f8a6671793f 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -95,7 +95,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys,
+								   0 /* flags */ );
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -794,7 +795,8 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 0 /* flags */ );
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -860,7 +862,8 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 0 /* flags */ );
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 06143e94c5a..3df091ac000 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -113,7 +113,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   0 /* flags */ );
 
 		node->iss_ScanDesc = scandesc;
 
@@ -209,7 +210,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   0 /* flags */ );
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1730,7 +1732,8 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 0 /* flags */ );
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1794,7 +1797,8 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 0 /* flags */ );
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..f0e14e53fab 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,8 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode,
+									 0 /* flags */ );
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..eaa8cfb6a1a 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,8 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL,
+								   0 /* flags */ );
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +376,8 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 0 /* flags */ );
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +410,6 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 0 /* flags */ );
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..6f63e9f80d0 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,8 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid,
+												0 /* flags */ );
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +461,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0 /* flags */ );
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +495,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0 /* flags */ );
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..c0f847b43be 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,8 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL,
+							   0 /* flags */ );
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53f85ccde01..9fbbb6a8ddc 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7178,7 +7178,8 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0,
+								 0 /* flags */ );
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..b69320a7fc8 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -158,7 +158,8 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys,
+									 uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -184,7 +185,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9b403203006..e2e07348f37 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..80ea0b437d1 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,7 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..ce5176bdf69 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,16 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table_beginscan_* functions
+ * and must not be passed by callers.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -420,7 +430,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +881,19 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	Assert((flags & ~SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -894,12 +911,13 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +946,7 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags, 0);
 }
 
 /*
@@ -939,11 +957,12 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -957,18 +976,19 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1001,7 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -994,7 +1014,7 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -1059,12 +1079,13 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1139,7 +1160,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1171,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1175,8 +1198,10 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1210,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v46-0003-Pass-down-information-on-table-modification-to-s.patch (9.5K, 4-v46-0003-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 105c2c2c0057ee9945cf6ec1c32061f617f627a2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v46 3/5] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/execUtils.c          |  8 ++++++++
 src/backend/executor/nodeBitmapHeapscan.c |  3 ++-
 src/backend/executor/nodeIndexonlyscan.c  |  9 ++++++---
 src/backend/executor/nodeIndexscan.c      | 12 ++++++++----
 src/backend/executor/nodeSamplescan.c     |  3 ++-
 src/backend/executor/nodeSeqscan.c        | 10 +++++++---
 src/backend/executor/nodeTidrangescan.c   | 11 ++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           |  2 ++
 9 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9886ab06b69..d2ffe28e010 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -736,6 +736,14 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 	return list_member_int(estate->es_plannedstmt->resultRelations, scanrelid);
 }
 
+/* Return true if the scan node's relation is not modified by the query */
+bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	return !bms_is_member(((Scan *) ss->ps.plan)->scanrelid,
+						  ss->ps.state->es_plannedstmt->modifiedRelids);
+}
+
 /* ----------------------------------------------------------------
  *		ExecOpenScanRelation
  *
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index e58bb02db43..7096e6f8645 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -149,7 +149,8 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 							   node->ss.ps.state->es_snapshot,
 							   0,
 							   NULL,
-							   0 /* flags */ );
+							   ScanRelIsReadOnly(&node->ss) ?
+							   SO_HINT_REL_READ_ONLY : 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f8a6671793f..3971e54d7da 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -96,7 +96,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
 								   node->ioss_NumOrderByKeys,
-								   0 /* flags */ );
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : 0);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -796,7 +797,8 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
 								 piscan,
-								 0 /* flags */ );
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -863,7 +865,8 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
 								 piscan,
-								 0 /* flags */ );
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 3df091ac000..09df10dd78a 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -114,7 +114,8 @@ IndexNext(IndexScanState *node)
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
 								   node->iss_NumOrderByKeys,
-								   0 /* flags */ );
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -211,7 +212,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
 								   node->iss_NumOrderByKeys,
-								   0 /* flags */ );
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1733,7 +1735,8 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
 								 piscan,
-								 0 /* flags */ );
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1798,7 +1801,8 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
 								 piscan,
-								 0 /* flags */ );
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index f0e14e53fab..98fab36fbdc 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -299,7 +299,8 @@ tablesample_init(SampleScanState *scanstate)
 									 scanstate->use_bulkread,
 									 allow_sync,
 									 scanstate->use_pagemode,
-									 0 /* flags */ );
+									 ScanRelIsReadOnly(&scanstate->ss) ?
+									 SO_HINT_REL_READ_ONLY : 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index eaa8cfb6a1a..2f4c18051cd 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -72,7 +72,8 @@ SeqNext(SeqScanState *node)
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
 								   0, NULL,
-								   0 /* flags */ );
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,9 +376,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
-								 0 /* flags */ );
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 }
 
 /* ----------------------------------------------------------------
@@ -411,5 +414,6 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
-								 0 /* flags */ );
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 6f63e9f80d0..f83a72e3635 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -246,7 +246,8 @@ TidRangeNext(TidRangeScanState *node)
 												estate->es_snapshot,
 												&node->trss_mintid,
 												&node->trss_maxtid,
-												0 /* flags */ );
+												ScanRelIsReadOnly(&node->ss) ?
+												SO_HINT_REL_READ_ONLY : 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -461,7 +462,9 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0 /* flags */ );
+										  pscan,
+										  ScanRelIsReadOnly(&node->ss) ?
+										  SO_HINT_REL_READ_ONLY : 0);
 }
 
 /* ----------------------------------------------------------------
@@ -495,5 +498,7 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0 /* flags */ );
+										  pscan,
+										  ScanRelIsReadOnly(&node->ss) ?
+										  SO_HINT_REL_READ_ONLY : 0);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index ce5176bdf69..014c686a5de 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..7979a17e4ec 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,8 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+extern bool ScanRelIsReadOnly(ScanState *ss);
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v46-0004-Allow-on-access-pruning-to-set-pages-all-visible.patch (9.2K, 5-v46-0004-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From e8f9da0d1ca12ab03cb58e4283dfd4111aa9fc2c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v46 4/5] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they require pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++--
 src/backend/access/heap/pruneheap.c      | 46 +++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              |  3 +-
 5 files changed, 43 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 8c7695ebfb9..d59b423c8ad 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -149,7 +149,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2546,7 +2547,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6693af8da7f..d83fd26b274 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -232,7 +234,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -253,7 +256,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -336,6 +340,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -392,6 +398,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -461,9 +468,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -920,21 +926,37 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM on-access for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1167,7 +1189,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f698c2d899b..24001b27387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e2e07348f37..f2a009141be 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -431,7 +432,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v46-0005-Set-pd_prune_xid-on-insert.patch (8.8K, 6-v46-0005-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 9d6d6c2529700e4fe381dbc55ef172ba13882fab Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v46 5/5] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to set the VM all-visible
after a page is filled with newly inserted tuples the first time it is
read. This means the page will get set all-visible when it is still in
shared buffers and avoid potential I/O amplification when vacuum later
has to scan the page and set it all-visible. It also enables index-only
scans of newly inserted data much sooner.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 39 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++-------
 3 files changed, 51 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..cdaf57e3f12 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM and we're not in bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4152,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d83fd26b274..bb364f53a44 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -275,7 +275,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1918,17 +1919,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-25 23:29  Tomas Vondra <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Tomas Vondra @ 2026-03-25 23:29 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On 3/25/26 19:54, Melanie Plageman wrote:
> On Wed, Mar 25, 2026 at 2:02 PM Tomas Vondra <[email protected]> wrote:
>>
>> 0002
>>
>> - Don't we usually keep "flags" as the last parameter? It seems a bit
>> weird that it's added in between relation and snapshot.
> 
> In an earlier review, Andres said he disliked using flags as the last
> parameter for index_beginscan() because its current last two
> parameters are integers (nkeys and norderbys), which could be
> confusing. Personally, I think you have to look at the function
> signature before just randomly passing stuff, and so it shouldn't
> matter -- but I didn't care enough to argue. If you agree with me that
> they should be last, then it's two against one and I'll change it back
> :) I can keep the callsite comments naming the flags parameter.
> 

Who am I to argue with Andres? ;-) I'm kinda used to flags being the
last argument, but it's not something I'm particularly attached to.

>> - Do we really want to pass two sets of flags to table_beginscan_common?
>>  I realize it's done to ensure "users" don't use internal flags, but
>> then maybe it'd be better to do that check in the places calling the
>> _common? Someone adding a new caller can break this in various ways
>> anyway, e.g. by setting bits in the internal flags, no?
> 
> Yes, callers of table_beginscan_common() could pass flags they
> shouldn't in internal_flags. But I was mostly trying to prevent the
> case where a user picks a flag that overlaps with an internal flag,
> conditionally passes it as a user flag, and then when they test for it
> in their AM-specific code, they aren't actually checking if their own
> flag is set.
> 

Ah, so we expect people to invent their "own" flags, outside what's in
ScanOptions? Or do I misunderstand how it works? (I admit not reading
the whole massive thread, as I was only interested in using the flags in
my own patch.)

> Anyway, it's not hard to move:
>     Assert((flags & SO_INTERNAL_FLAGS) == 0);
> into the table_beginscan_common() callers and then pass the internal
> flags the caller wants to pass + the user specified flags to
> table_beginscan_common(). And I think that fixes what you are talking
> about?
> 

Right. I wouldn't say it "fixes" it, because it wasn't a bug. But it
does ensure the two sets do not "overlap", which I assume should never
happen.

>> If we want to have these checks, should we be more thorough? Should we
>> check the internal flags only set internal flags?
> 
> That's easy enough too.
> Assert((internal_flags & ~SO_INTERNAL_FLAGS) == 0); I think does the trick.
> 
> I think this would largely be the same as having
> table_beginscan_common() callers validate that the user-passed flags
> are not internal and then OR them together with the internal flags
> they want to pass to table_beginscan_common().
> 
> I'm trying to think of cases where the two approaches would differ so
> I can decide which to do.
> 

OK


-- 
Tomas Vondra






^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-26 14:51  Melanie Plageman <[email protected]>
  parent: Tomas Vondra <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-03-26 14:51 UTC (permalink / raw)
  To: Tomas Vondra <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, Mar 25, 2026 at 7:29 PM Tomas Vondra <[email protected]> wrote:
>
> >> - Do we really want to pass two sets of flags to table_beginscan_common?
> >>  I realize it's done to ensure "users" don't use internal flags, but
> >> then maybe it'd be better to do that check in the places calling the
> >> _common? Someone adding a new caller can break this in various ways
> >> anyway, e.g. by setting bits in the internal flags, no?
> >
> > Yes, callers of table_beginscan_common() could pass flags they
> > shouldn't in internal_flags. But I was mostly trying to prevent the
> > case where a user picks a flag that overlaps with an internal flag,
> > conditionally passes it as a user flag, and then when they test for it
> > in their AM-specific code, they aren't actually checking if their own
> > flag is set.
>
> Ah, so we expect people to invent their "own" flags, outside what's in
> ScanOptions? Or do I misunderstand how it works? (I admit not reading
> the whole massive thread, as I was only interested in using the flags in
> my own patch.)

Yes, this isn't really explored in the rest of the thread. I thought
since the flags are threaded all the way through and they can
set/check the flags in the table AM-specific layer, it would make
sense that they could choose flags for their own purposes. They don't
have to wait for consensus on getting a new SO type added. I don't
know if this is a bad idea. However, changing the table AM wrappers
seems more justifiable if we are making them extensible in this way.

> >> If we want to have these checks, should we be more thorough? Should we
> >> check the internal flags only set internal flags?
> >
> > That's easy enough too.
> > Assert((internal_flags & ~SO_INTERNAL_FLAGS) == 0); I think does the trick.

I did this in the previously posted v46.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-26 16:07  Tomas Vondra <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Tomas Vondra @ 2026-03-26 16:07 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On 3/26/26 15:51, Melanie Plageman wrote:
> On Wed, Mar 25, 2026 at 7:29 PM Tomas Vondra <[email protected]> wrote:
>>
>>>> - Do we really want to pass two sets of flags to table_beginscan_common?
>>>>  I realize it's done to ensure "users" don't use internal flags, but
>>>> then maybe it'd be better to do that check in the places calling the
>>>> _common? Someone adding a new caller can break this in various ways
>>>> anyway, e.g. by setting bits in the internal flags, no?
>>>
>>> Yes, callers of table_beginscan_common() could pass flags they
>>> shouldn't in internal_flags. But I was mostly trying to prevent the
>>> case where a user picks a flag that overlaps with an internal flag,
>>> conditionally passes it as a user flag, and then when they test for it
>>> in their AM-specific code, they aren't actually checking if their own
>>> flag is set.
>>
>> Ah, so we expect people to invent their "own" flags, outside what's in
>> ScanOptions? Or do I misunderstand how it works? (I admit not reading
>> the whole massive thread, as I was only interested in using the flags in
>> my own patch.)
> 
> Yes, this isn't really explored in the rest of the thread. I thought
> since the flags are threaded all the way through and they can
> set/check the flags in the table AM-specific layer, it would make
> sense that they could choose flags for their own purposes. They don't
> have to wait for consensus on getting a new SO type added. I don't
> know if this is a bad idea. However, changing the table AM wrappers
> seems more justifiable if we are making them extensible in this way.
> 

No idea. Do we have an example of a TAM actually needing this? If not,
I'd probably advise to remove that and keep the patch simpler. My past
attempts to future-proof a patch like this rarely worked.

If we want to give TAMs the opportunity to define custom flags, do we
already do something like that elsewhere? Is there a precedent how to do
that? If we allow the TAM to pick arbitrary flag values, it's easy to
end up with collisions later (if we add a new internal flag). Maybe
there is a way to prevent that? E.g. we could restrict internal flags to
0x0000FFFF, and custom flags to 0xFFFF0000?

regards

-- 
Tomas Vondra






^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-26 23:10  David Rowley <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: David Rowley @ 2026-03-26 23:10 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Thu, 26 Mar 2026 at 12:14, Melanie Plageman
<[email protected]> wrote:
> Attached v46 addresses your feedback and has a bit of assorted cleanup in it.

(I've not had a chance to process this thread, so apologies if I
missed discussion on certain things I'm going to say)

I was looking at v46-0001. With:

+++ b/src/include/nodes/plannodes.h
@@ -112,6 +112,12 @@ typedef struct PlannedStmt
  */
  Bitmapset  *unprunableRelids;

+ /*
+ * RT indexes of relations modified by the query through
+ * UPDATE/DELETE/INSERT/MERGE or targeted by SELECT FOR UPDATE/SHARE.
+ */
+ Bitmapset  *modifiedRelids;
+

This doesn't really mention anything about leaf partitions not being
mentioned for INSERT queries. You did mention it in standard_planner()
here:

+ * modification. Conversely, leaf partitions whose result relations are
+ * created at the time of insert are not included here.

I think if someone is going to use this field, they're going to look
at where the field is defined to find out what it is, not where it
gets populated.

I'm also wondering about having this combined field. If you were to
have a Bitmapset field that mirrors "List *resultRelations;", then
have another:

/* a list of PlanRowMark's */
List   *rowMarks;

+ /* Relids which have rowMarks */
+ Bitmapset *rowMarkRelids;

I think they're more likely to be useful for other purposes, and I
think the only pain that it causes you is that you have to call
bms_is_member() twice in ScanRelIsReadOnly().

Then, as a follow-up, maybe we could consider removing
PlannedStmt.resultRelations.  (The deprecated)
ExecRelationIsTargetRelation() could use the new Bitmapset, which
would be more efficient. OverExplain does do:

if (es->format != EXPLAIN_FORMAT_TEXT ||
plannedstmt->resultRelations != NIL)
overexplain_intlist("Result RTIs", plannedstmt->resultRelations, es);

but maybe Robert is ok with those coming out in ascending numerical
order rather than list order. overexplain_bitmapset() would do that.

In [1], I didn't see any code actually using the field. Just a couple
of projects that have duplicated the copyObject() code.

I did quickly look over the remaining patches. I wondered if you might
want to add a new ScanOption SO_NONE = 0, or SO_EMPTY_FLAGS. It might
make the places where you're passing zero directly easier to read?

David

[1] https://codesearch.debian.net/search?q=resultRelations&literal=1





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-27 19:17  Melanie Plageman <[email protected]>
  parent: David Rowley <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-03-27 19:17 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Thu, Mar 26, 2026 at 7:10 PM David Rowley <[email protected]> wrote:
>
> I was looking at v46-0001. With:

Thanks for taking a look!

> +++ b/src/include/nodes/plannodes.h
> + Bitmapset  *modifiedRelids;
> +
>
> This doesn't really mention anything about leaf partitions not being
> mentioned for INSERT queries. You did mention it in standard_planner()
> here:
>
> I'm also wondering about having this combined field. If you were to
> have a Bitmapset field that mirrors "List *resultRelations;", then
> have another:
>
> /* a list of PlanRowMark's */
> List   *rowMarks;
>
> + /* Relids which have rowMarks */
> + Bitmapset *rowMarkRelids;
>
> I think they're more likely to be useful for other purposes, and I
> think the only pain that it causes you is that you have to call
> bms_is_member() twice in ScanRelIsReadOnly().

Yea, outside of the insert into leaf partitions case, I thought of
another, perhaps even more compelling reason the combined field might
be a bit confusing:
Take a table t1 and a table t2. If you do
SELECT * FROM t1 JOIN t2 ON t1.id  = t2.id FOR UPDATE of t1;
t1 will get a ROW_MARK_EXCLUSIVE and t2 would get a ROW_MARK_REFERENCE
(that's just how preprocess_rowmarks() works).
That means modifiedRelids would contain t2, even though t2 is not
being locked for update. For the purposes of setting the VM, it's
totally fine that we are more conservative than we need to be and
don't consider setting it when scanning t2. But for the purposes of
modifiedRelids, it's a bit confusing that t2 is in there.

But we can't just exclude ROW_MARK_REFERENCE from modifiedRelids
because we rely on ROW_MARK_REFERENCE to avoid setting the VM for a
table we are updating or deleting from when it is mentioned more than
once in the query (e.g. UPDATE foo SET x = 1 FROM foo f2 WHERE foo.id
= f2.id).

So, for that reason and because of the missing leaf partitions for
inserts, I think making quick reference bitmapsets would be better.
I've done this in attached v47.

I've also removed the asserts in ExecInsert/Update/Delete because they
are a bit tautological now.

My one remaining question is whether the two new bitmapsets
(rowMarkRelids and resultRelationRelids) should move from the
PlannedStmt to the EState. They are determined at plan time and never
modified during execution. However, I do notice there are other EState
members that seem like just a copy of info from the PlannedStmt that
isn't modified during execution (e.g. es_rteperminfos/permInfos).
However, putting them in the EState increases the work required to get
them to parallel workers and to the child estate for EPQs. I would
prefer to keep it in the PlannedStmt but am worried that breaks
convention.

> Then, as a follow-up, maybe we could consider removing
> PlannedStmt.resultRelations.  (The deprecated)
> ExecRelationIsTargetRelation() could use the new Bitmapset, which
> would be more efficient.

Yea, I like this and think it makes sense. Done in v47.

> I did quickly look over the remaining patches. I wondered if you might
> want to add a new ScanOption SO_NONE = 0, or SO_EMPTY_FLAGS. It might
> make the places where you're passing zero directly easier to read?

That makes sense to me. Done in v47.

- Melanie


Attachments:

  [text/x-patch] v47-0001-Make-it-cheap-to-check-with-relations-are-modifi.patch (4.4K, 2-v47-0001-Make-it-cheap-to-check-with-relations-are-modifi.patch)
  download | inline diff:
From 499fe3dbdddb6321b5f09d9d94e37a5c97303bda Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Mar 2026 09:21:22 -0400
Subject: [PATCH v47 1/6] Make it cheap to check with relations are modified by
 a query

Save the range table indexes of result relations and row mark relations in
separate bitmaps in the PlannedStmt. Precomputing them allows cheap membership
checks during execution. With a few exceptions, these two groups comprise all
relations that will be modified by a query. This includes relations targeted by
INSERT, UPDATE, DELETE, and MERGE as well as relations with any row mark (like
SELECT for UPDATE).

A later commit will use this information during scans to control whether
or not on-access pruning is allowed to set the visibility map -- which
would be counterproductive if the query will modify the page.

PlannedStmt->resultRelations is only used in a membership check, so it may make
sense to replace its usage with the new resultRelationRelids.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: David Rowley <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execParallel.c  |  2 ++
 src/backend/optimizer/plan/planner.c | 19 ++++++++++++++++++-
 src/include/nodes/plannodes.h        |  9 +++++++++
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..791fcb88de9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,8 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->resultRelationRelids = estate->es_plannedstmt->resultRelationRelids;
+	pstmt->rowMarkRelids = estate->es_plannedstmt->rowMarkRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d19800ad6a5..df4c99fc3ff 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,11 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	Bitmapset  *resultRelationRelids = NULL;
+	Bitmapset  *rowMarkRelids = NULL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +664,20 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute resultRelationRelids and rowMarkRelids from resultRelations and
+	 * rowMarks for quick access.
+	 */
+	foreach(lc, glob->resultRelations)
+		resultRelationRelids = bms_add_member(resultRelationRelids,
+											  lfirst_int(lc));
+	foreach(lc, glob->finalrowmarks)
+		rowMarkRelids = bms_add_member(rowMarkRelids,
+									   ((PlanRowMark *) lfirst(lc))->rti);
+	result->resultRelationRelids = resultRelationRelids;
+	result->rowMarkRelids = rowMarkRelids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..88be65d7bde 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -121,6 +121,9 @@ typedef struct PlannedStmt
 	/* integer list of RT indexes, or NIL */
 	List	   *resultRelations;
 
+	/* RT indexes of result relations targeted by INSERT/UPDATE/DELETE/MERGE */
+	Bitmapset  *resultRelationRelids;
+
 	/* list of AppendRelInfo nodes */
 	List	   *appendRelations;
 
@@ -138,6 +141,12 @@ typedef struct PlannedStmt
 	/* a list of PlanRowMark's */
 	List	   *rowMarks;
 
+	/*
+	 * RT indexes of relations with row marks. Useful for quick membership
+	 * checks instead of iterating through rowMarks.
+	 */
+	Bitmapset  *rowMarkRelids;
+
 	/* OIDs of relations the plan depends on */
 	List	   *relationOids;
 
-- 
2.43.0



  [text/x-patch] v47-0002-Remove-PlannedStmt-resultRelations-in-favor-of-r.patch (3.8K, 3-v47-0002-Remove-PlannedStmt-resultRelations-in-favor-of-r.patch)
  download | inline diff:
From bf13aaf3f9a9610e7e7be381dfdb7242f22761c7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Mar 2026 08:35:00 -0400
Subject: [PATCH v47 2/6] Remove PlannedStmt->resultRelations in favor of
 resultRelationRelids

PlannedStmt->resultRelations was an integer list of range table indexes.
Now that we have a bitmapset, which offers cheap membership checks,
remove the list and update all consumers to use the bitmapset.
---
 contrib/pg_overexplain/pg_overexplain.c | 5 +++--
 src/backend/executor/execParallel.c     | 1 -
 src/backend/executor/execUtils.c        | 2 +-
 src/backend/optimizer/plan/planner.c    | 1 -
 src/include/nodes/plannodes.h           | 4 ----
 5 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/contrib/pg_overexplain/pg_overexplain.c b/contrib/pg_overexplain/pg_overexplain.c
index c2b90493cc6..b4e90909289 100644
--- a/contrib/pg_overexplain/pg_overexplain.c
+++ b/contrib/pg_overexplain/pg_overexplain.c
@@ -780,8 +780,9 @@ overexplain_range_table(PlannedStmt *plannedstmt, ExplainState *es)
 		overexplain_bitmapset("Unprunable RTIs", plannedstmt->unprunableRelids,
 							  es);
 	if (es->format != EXPLAIN_FORMAT_TEXT ||
-		plannedstmt->resultRelations != NIL)
-		overexplain_intlist("Result RTIs", plannedstmt->resultRelations, es);
+		!bms_is_empty(plannedstmt->resultRelationRelids))
+		overexplain_bitmapset("Result RTIs", plannedstmt->resultRelationRelids,
+							  es);
 
 	/* Close group, we're all done */
 	ExplainCloseGroup("Range Table", "Range Table", false, es);
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 791fcb88de9..1bab6160036 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -191,7 +191,6 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->resultRelationRelids = estate->es_plannedstmt->resultRelationRelids;
 	pstmt->rowMarkRelids = estate->es_plannedstmt->rowMarkRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
-	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
 	pstmt->planOrigin = PLAN_STMT_INTERNAL;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9886ab06b69..36c5285d252 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -733,7 +733,7 @@ ExecCreateScanSlotFromOuterPlan(EState *estate,
 bool
 ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 {
-	return list_member_int(estate->es_plannedstmt->resultRelations, scanrelid);
+	return bms_is_member(scanrelid, estate->es_plannedstmt->resultRelationRelids);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index df4c99fc3ff..9853443209d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -659,7 +659,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 											  glob->prunableRelids);
 	result->permInfos = glob->finalrteperminfos;
 	result->subrtinfos = glob->subrtinfos;
-	result->resultRelations = glob->resultRelations;
 	result->appendRelations = glob->appendRelations;
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 88be65d7bde..19e5d814c59 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -117,10 +117,6 @@ typedef struct PlannedStmt
 	 */
 	List	   *permInfos;
 
-	/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
-	/* integer list of RT indexes, or NIL */
-	List	   *resultRelations;
-
 	/* RT indexes of result relations targeted by INSERT/UPDATE/DELETE/MERGE */
 	Bitmapset  *resultRelationRelids;
 
-- 
2.43.0



  [text/x-patch] v47-0003-Thread-flags-through-begin-scan-APIs.patch (34.2K, 4-v47-0003-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 58f307d8f03fbcfbc4933e3f2cecf752294d804c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v47 3/6] Thread flags through begin-scan APIs

Add an AM user-settable flags parameter to several of the table
scan functions, one table AM callback, and index_beginscan(). This
allows users to pass additional context to be used when building the
scan descriptors.

For index scans, a new flags field is added to IndexFetchTableData, and
the heap AM saves the caller-provided flags there.

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: David Rowley <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  9 ++-
 src/backend/access/index/genam.c          |  6 +-
 src/backend/access/index/indexam.c        | 10 +--
 src/backend/access/nbtree/nbtsort.c       |  3 +-
 src/backend/access/table/tableam.c        | 22 +++---
 src/backend/commands/constraint.c         |  3 +-
 src/backend/commands/copyto.c             |  3 +-
 src/backend/commands/tablecmds.c          | 13 ++--
 src/backend/commands/typecmds.c           |  6 +-
 src/backend/executor/execIndexing.c       |  4 +-
 src/backend/executor/execReplication.c    | 12 ++--
 src/backend/executor/nodeBitmapHeapscan.c |  3 +-
 src/backend/executor/nodeIndexonlyscan.c  |  9 ++-
 src/backend/executor/nodeIndexscan.c      | 12 ++--
 src/backend/executor/nodeSamplescan.c     |  3 +-
 src/backend/executor/nodeSeqscan.c        |  9 ++-
 src/backend/executor/nodeTidrangescan.c   |  7 +-
 src/backend/partitioning/partbounds.c     |  3 +-
 src/backend/utils/adt/selfuncs.c          |  3 +-
 src/include/access/genam.h                |  6 +-
 src/include/access/heapam.h               |  5 +-
 src/include/access/relscan.h              |  1 +
 src/include/access/tableam.h              | 81 ++++++++++++++++-------
 26 files changed, 157 insertions(+), 84 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..d164c4c03ad 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, SO_NONE);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..bdb30752e09 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									SO_NONE);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..9d83a495775 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									SO_NONE);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d40878928e1..99280cd8159 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -81,11 +81,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -763,7 +764,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0,
+									SO_NONE);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -772,7 +774,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL,
+									SO_NONE);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1408989c568 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,8 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0,
+										 SO_NONE);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +717,8 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0,
+									 SO_NONE);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index fbfc33159eb..13cdbb86cd7 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -258,7 +258,8 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys,
+				uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -285,7 +286,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -594,7 +595,8 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan,
+						 uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -616,7 +618,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..756dfa3dcf4 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,8 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared),
+									SO_NONE);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..86481d7c029 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, SO_NONE);
 }
 
 
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan,
+						 uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +177,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +186,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +208,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +217,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +250,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, SO_NONE);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..421d8c359f0 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,8 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation,
+															SO_NONE);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..f0e0147c665 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1336,7 +1336,8 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL,
+							   SO_NONE);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c69c12dc014..ec0063287d0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,8 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL,
+							   SO_NONE);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,8 +13981,8 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
-
+	scan = table_beginscan(rel, snapshot, 0, NULL,
+						   SO_NONE);
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
 									  ALLOCSET_SMALL_SIZES);
@@ -22882,7 +22883,8 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL,
+							   SO_NONE);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23346,7 +23348,8 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL,
+						   SO_NONE);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..cd38e9cddf4 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,8 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL,
+							   SO_NONE);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3267,8 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL,
+							   SO_NONE);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cc6eb3a6ee9 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,9 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index,
+								 &DirtySnapshot, NULL, indnkeyatts, 0,
+								 SO_NONE);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..fea8991cb04 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,8 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   &snap, NULL, skey_attoff, 0, SO_NONE);
 
 retry:
 	found = false;
@@ -383,7 +384,8 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL,
+						   SO_NONE);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +604,8 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL,
+						   SO_NONE);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +669,8 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   SnapshotAny, NULL, skey_attoff, 0, SO_NONE);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..69683d81527 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,8 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   SO_NONE);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 9eab81fd1c8..02df40f32c5 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -95,7 +95,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys,
+								   SO_NONE);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -794,7 +795,8 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -860,7 +862,8 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 06143e94c5a..3c0b8daf664 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -113,7 +113,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -209,7 +210,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1730,7 +1732,8 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1794,7 +1797,8 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cf32df33d82 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,8 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode,
+									 SO_NONE);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..09ccc65de1c 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,8 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL,
+								   SO_NONE);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +376,8 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +410,6 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 SO_NONE);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..084e4c6ec90 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,8 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid,
+												SO_NONE);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +461,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +495,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, SO_NONE);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..f867d1b75a5 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,8 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL,
+							   SO_NONE);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53f85ccde01..4160d2d6e24 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7178,7 +7178,8 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0,
+								 SO_NONE);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..b69320a7fc8 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -158,7 +158,8 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys,
+									 uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -184,7 +185,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9b403203006..e2e07348f37 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..80ea0b437d1 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,7 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..60ceee9decd 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -45,6 +45,8 @@ typedef struct ValidateIndexState ValidateIndexState;
  */
 typedef enum ScanOptions
 {
+	SO_NONE = 0,
+
 	/* one of SO_TYPE_* may be specified */
 	SO_TYPE_SEQSCAN = 1 << 0,
 	SO_TYPE_BITMAPSCAN = 1 << 1,
@@ -65,6 +67,19 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table scan functions and
+ * shouldn't be passed by callers. Some of these are effectively set by callers
+ * through parameters to table scan functions (e.g. SO_ALLOW_STRAT/allow_strat),
+ * however, for now, retain tight control over them and don't allow users to
+ * pass these themselves to table scan functions.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -420,7 +435,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +886,19 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	Assert((flags & ~SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -894,12 +916,13 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +951,8 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  flags, SO_NONE);
 }
 
 /*
@@ -939,11 +963,12 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -957,18 +982,19 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1007,8 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								  flags, SO_NONE);
 }
 
 /*
@@ -994,7 +1021,8 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL,
+								  flags, SO_NONE);
 }
 
 /*
@@ -1059,12 +1087,13 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1139,7 +1168,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1179,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1175,8 +1206,10 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1218,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v47-0004-Pass-down-information-on-table-modification-to-s.patch (10.0K, 5-v47-0004-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 975d619cbf3d9158a8134c9c62f8cf936290574b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v47 4/6] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/execUtils.c          | 21 +++++++++++++++++++++
 src/backend/executor/nodeBitmapHeapscan.c |  3 ++-
 src/backend/executor/nodeIndexonlyscan.c  |  9 ++++++---
 src/backend/executor/nodeIndexscan.c      | 12 ++++++++----
 src/backend/executor/nodeSamplescan.c     |  3 ++-
 src/backend/executor/nodeSeqscan.c        | 10 +++++++---
 src/backend/executor/nodeTidrangescan.c   | 11 ++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           |  2 ++
 9 files changed, 59 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 36c5285d252..f090de49921 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -736,6 +736,27 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 	return bms_is_member(scanrelid, estate->es_plannedstmt->resultRelationRelids);
 }
 
+/*
+ * Return true if the scan node's relation is not modified by the query.
+ *
+ * This is not perfectly accurate. INSERT ... SELECT from the same table does
+ * not add the scan relation to resultRelationRelids, so it will be reported
+ * as read-only even though the query modifies it.
+ *
+ * Conversely, when any relation in the query has a modifying row mark, all
+ * other relations get a ROW_MARK_REFERENCE, causing them to be reported as
+ * not read-only even though they may only be read.
+ */
+bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	Index		scanrelid = ((Scan *) ss->ps.plan)->scanrelid;
+	PlannedStmt *pstmt = ss->ps.state->es_plannedstmt;
+
+	return !bms_is_member(scanrelid, pstmt->resultRelationRelids) &&
+		!bms_is_member(scanrelid, pstmt->rowMarkRelids);
+}
+
 /* ----------------------------------------------------------------
  *		ExecOpenScanRelation
  *
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 69683d81527..73831aed451 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -149,7 +149,8 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 							   node->ss.ps.state->es_snapshot,
 							   0,
 							   NULL,
-							   SO_NONE);
+							   ScanRelIsReadOnly(&node->ss) ?
+							   SO_HINT_REL_READ_ONLY : SO_NONE);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 02df40f32c5..de6154fd541 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -96,7 +96,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
 								   node->ioss_NumOrderByKeys,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -796,7 +797,8 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -863,7 +865,8 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 3c0b8daf664..1620d146071 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -114,7 +114,8 @@ IndexNext(IndexScanState *node)
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
 								   node->iss_NumOrderByKeys,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -211,7 +212,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
 								   node->iss_NumOrderByKeys,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1733,7 +1735,8 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1798,7 +1801,8 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cf32df33d82..f3d273e1c5e 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -299,7 +299,8 @@ tablesample_init(SampleScanState *scanstate)
 									 scanstate->use_bulkread,
 									 allow_sync,
 									 scanstate->use_pagemode,
-									 SO_NONE);
+									 ScanRelIsReadOnly(&scanstate->ss) ?
+									 SO_HINT_REL_READ_ONLY : SO_NONE);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 09ccc65de1c..04803b0e37d 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -72,7 +72,8 @@ SeqNext(SeqScanState *node)
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
 								   0, NULL,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,9 +376,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -411,5 +414,6 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 084e4c6ec90..4a8fe91b2b3 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -246,7 +246,8 @@ TidRangeNext(TidRangeScanState *node)
 												estate->es_snapshot,
 												&node->trss_mintid,
 												&node->trss_maxtid,
-												SO_NONE);
+												ScanRelIsReadOnly(&node->ss) ?
+												SO_HINT_REL_READ_ONLY : SO_NONE);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -461,7 +462,9 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, SO_NONE);
+										  pscan,
+										  ScanRelIsReadOnly(&node->ss) ?
+										  SO_HINT_REL_READ_ONLY : SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -495,5 +498,7 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, SO_NONE);
+										  pscan,
+										  ScanRelIsReadOnly(&node->ss) ?
+										  SO_HINT_REL_READ_ONLY : SO_NONE);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 60ceee9decd..5f1c1079cb5 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..7979a17e4ec 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,8 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+extern bool ScanRelIsReadOnly(ScanState *ss);
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v47-0005-Allow-on-access-pruning-to-set-pages-all-visible.patch (9.9K, 6-v47-0005-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 3c4c589c84fb5444fe40b8a8eec506845d1130e0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v47 5/6] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they require pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++-
 src/backend/access/heap/pruneheap.c      | 56 +++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              |  3 +-
 5 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 99280cd8159..3433ea93c11 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -149,7 +149,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2546,7 +2547,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6693af8da7f..48f7cf77bc8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -232,7 +234,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -251,9 +254,20 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
  * VM corruption during pruning, we will fix it. Caller is responsible for
  * unpinning *vmbuffer.
+ *
+ * rel_read_only is true if we determined at plan time that the query does not
+ * modify the relation. It is counterproductive to set the VM if the query
+ * will immediately clear it.
+ *
+ * As noted in ScanRelIsReadOnly(), INSERT ... SELECT on the same table will
+ * report the scan relation as read-only. This is usually harmless in
+ * practice. It is useful to set scanned pages all-visible that won't be
+ * inserted into. Pages we do insert to rarely meet the criteria for pruning,
+ * and those that do will contain in-progress inserts after the first tuple.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -336,6 +350,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -392,6 +408,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -461,9 +478,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -920,21 +936,37 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM on-access for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1167,7 +1199,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f698c2d899b..24001b27387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e2e07348f37..f2a009141be 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -431,7 +432,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v47-0006-Set-pd_prune_xid-on-insert.patch (8.8K, 7-v47-0006-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 7f874b2f759a48a4553b85a2e7655075a311f32e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v47 6/6] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to set the VM all-visible
after a page is filled with newly inserted tuples the first time it is
read. This means the page will get set all-visible when it is still in
shared buffers and avoid potential I/O amplification when vacuum later
has to scan the page and set it all-visible. It also enables index-only
scans of newly inserted data much sooner.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 39 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++-------
 3 files changed, 51 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..cdaf57e3f12 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM and we're not in bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4152,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 48f7cf77bc8..5bb9e929acf 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -285,7 +285,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1928,17 +1929,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-27 19:31  Melanie Plageman <[email protected]>
  parent: Tomas Vondra <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2026-03-27 19:31 UTC (permalink / raw)
  To: Tomas Vondra <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Thu, Mar 26, 2026 at 12:07 PM Tomas Vondra <[email protected]> wrote:
>
> >> Ah, so we expect people to invent their "own" flags, outside what's in
> >> ScanOptions? Or do I misunderstand how it works? (I admit not reading
> >> the whole massive thread, as I was only interested in using the flags in
> >> my own patch.)
> >
> > Yes, this isn't really explored in the rest of the thread. I thought
> > since the flags are threaded all the way through and they can
> > set/check the flags in the table AM-specific layer, it would make
> > sense that they could choose flags for their own purposes. They don't
> > have to wait for consensus on getting a new SO type added. I don't
> > know if this is a bad idea. However, changing the table AM wrappers
> > seems more justifiable if we are making them extensible in this way.
> >
>
> No idea. Do we have an example of a TAM actually needing this? If not,
> I'd probably advise to remove that and keep the patch simpler. My past
> attempts to future-proof a patch like this rarely worked.

Yea, not allowing that doesn't really simplify the patch.
But, talking to Andres off-list yesterday, he reminded me that users
can simply add a new member to their table access method-specific scan
descriptor (e.g. HeapScanDescData could get a new member). The value
of flags lies in enabling table AM-agnostic executor code to pass
flags through the table AM to the scan code. Besides my read-only hint
scan option, he gave some examples -- like a hint to the scan that
there is a LIMIT on the query. I think that is compelling.

While exploring this, I realized that for a few internal flags, such
as SO_ALLOW_STRAT and SO_ALLOW_SYNC, we have table scan functions,
like table_beginscan_strat(), that accept parameters for setting those
flags. They are basically the same as table_beginscan() but give users
control over those flags. I think we can use the flags parameter to
deprecate some of these specialized table scan functions. I think we
can simplify the scan_rescan() callback as well. I don't think it
makes sense to do it this late in the 19 release, though. All of those
changes require having a flags parameter in the top level scan
wrappers first. So, I think it is reasonable to do just that this
release.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-29 17:16  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-03-29 17:16 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Fri, Mar 27, 2026 at 3:17 PM Melanie Plageman
<[email protected]> wrote:
>
>  Done in v47.

Attached v48 does a bit more cleanup. No functional changes. I'm
planning to push this soon. I think my remaining question is whether I
should move the row marks and result relation bitmaps into the estate.
I'm leaning toward not doing that and leaving them in the PlannedStmt.
Anyway, If I want to replace the list of result relation RTIs in the
PlannedStmt, I have to leave the bitmapset version there.

- Melanie


Attachments:

  [text/x-patch] v48-0001-Make-it-cheap-to-check-if-a-relation-is-modified.patch (4.4K, 2-v48-0001-Make-it-cheap-to-check-if-a-relation-is-modified.patch)
  download | inline diff:
From 04d24039ec7c14672955aaaba37e3aa512858a0d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Mar 2026 09:21:22 -0400
Subject: [PATCH v48 1/6] Make it cheap to check if a relation is modified by a
 query

Save the range table indexes of result relations and row mark relations
in separate bitmaps in the PlannedStmt. Precomputing them allows cheap
membership checks during execution. With a few exceptions, these two
groups comprise all relations that will be modified by a query. This
includes relations targeted by INSERT, UPDATE, DELETE, and MERGE as well
as relations with any row mark (like SELECT FOR UPDATE).

A later commit will use this information during scans to control whether
or not on-access pruning is allowed to set the visibility map -- which
would be counterproductive if the query will modify the page.

PlannedStmt->resultRelations is only used in a membership check, so it
may make sense to replace its usage with the new resultRelationRelids.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: David Rowley <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execParallel.c  |  2 ++
 src/backend/optimizer/plan/planner.c | 19 ++++++++++++++++++-
 src/include/nodes/plannodes.h        |  9 +++++++++
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..791fcb88de9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,8 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->resultRelationRelids = estate->es_plannedstmt->resultRelationRelids;
+	pstmt->rowMarkRelids = estate->es_plannedstmt->rowMarkRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d19800ad6a5..df4c99fc3ff 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,11 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	Bitmapset  *resultRelationRelids = NULL;
+	Bitmapset  *rowMarkRelids = NULL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +664,20 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute resultRelationRelids and rowMarkRelids from resultRelations and
+	 * rowMarks for quick access.
+	 */
+	foreach(lc, glob->resultRelations)
+		resultRelationRelids = bms_add_member(resultRelationRelids,
+											  lfirst_int(lc));
+	foreach(lc, glob->finalrowmarks)
+		rowMarkRelids = bms_add_member(rowMarkRelids,
+									   ((PlanRowMark *) lfirst(lc))->rti);
+	result->resultRelationRelids = resultRelationRelids;
+	result->rowMarkRelids = rowMarkRelids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..88be65d7bde 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -121,6 +121,9 @@ typedef struct PlannedStmt
 	/* integer list of RT indexes, or NIL */
 	List	   *resultRelations;
 
+	/* RT indexes of result relations targeted by INSERT/UPDATE/DELETE/MERGE */
+	Bitmapset  *resultRelationRelids;
+
 	/* list of AppendRelInfo nodes */
 	List	   *appendRelations;
 
@@ -138,6 +141,12 @@ typedef struct PlannedStmt
 	/* a list of PlanRowMark's */
 	List	   *rowMarks;
 
+	/*
+	 * RT indexes of relations with row marks. Useful for quick membership
+	 * checks instead of iterating through rowMarks.
+	 */
+	Bitmapset  *rowMarkRelids;
+
 	/* OIDs of relations the plan depends on */
 	List	   *relationOids;
 
-- 
2.43.0



  [text/x-patch] v48-0002-Remove-PlannedStmt-resultRelations-in-favor-of-r.patch (3.8K, 3-v48-0002-Remove-PlannedStmt-resultRelations-in-favor-of-r.patch)
  download | inline diff:
From 7c331c575a377b40a1dd1142b23fa3a8692de38f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Mar 2026 08:35:00 -0400
Subject: [PATCH v48 2/6] Remove PlannedStmt->resultRelations in favor of
 resultRelationRelids

PlannedStmt->resultRelations was an integer list of range table indexes.
Now that we have a bitmapset, which offers cheap membership checks,
remove the list and update all consumers to use the bitmapset.
---
 contrib/pg_overexplain/pg_overexplain.c | 5 +++--
 src/backend/executor/execParallel.c     | 1 -
 src/backend/executor/execUtils.c        | 2 +-
 src/backend/optimizer/plan/planner.c    | 1 -
 src/include/nodes/plannodes.h           | 4 ----
 5 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/contrib/pg_overexplain/pg_overexplain.c b/contrib/pg_overexplain/pg_overexplain.c
index c2b90493cc6..b4e90909289 100644
--- a/contrib/pg_overexplain/pg_overexplain.c
+++ b/contrib/pg_overexplain/pg_overexplain.c
@@ -780,8 +780,9 @@ overexplain_range_table(PlannedStmt *plannedstmt, ExplainState *es)
 		overexplain_bitmapset("Unprunable RTIs", plannedstmt->unprunableRelids,
 							  es);
 	if (es->format != EXPLAIN_FORMAT_TEXT ||
-		plannedstmt->resultRelations != NIL)
-		overexplain_intlist("Result RTIs", plannedstmt->resultRelations, es);
+		!bms_is_empty(plannedstmt->resultRelationRelids))
+		overexplain_bitmapset("Result RTIs", plannedstmt->resultRelationRelids,
+							  es);
 
 	/* Close group, we're all done */
 	ExplainCloseGroup("Range Table", "Range Table", false, es);
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 791fcb88de9..1bab6160036 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -191,7 +191,6 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->resultRelationRelids = estate->es_plannedstmt->resultRelationRelids;
 	pstmt->rowMarkRelids = estate->es_plannedstmt->rowMarkRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
-	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
 	pstmt->planOrigin = PLAN_STMT_INTERNAL;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9886ab06b69..36c5285d252 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -733,7 +733,7 @@ ExecCreateScanSlotFromOuterPlan(EState *estate,
 bool
 ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 {
-	return list_member_int(estate->es_plannedstmt->resultRelations, scanrelid);
+	return bms_is_member(scanrelid, estate->es_plannedstmt->resultRelationRelids);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index df4c99fc3ff..9853443209d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -659,7 +659,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 											  glob->prunableRelids);
 	result->permInfos = glob->finalrteperminfos;
 	result->subrtinfos = glob->subrtinfos;
-	result->resultRelations = glob->resultRelations;
 	result->appendRelations = glob->appendRelations;
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 88be65d7bde..19e5d814c59 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -117,10 +117,6 @@ typedef struct PlannedStmt
 	 */
 	List	   *permInfos;
 
-	/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
-	/* integer list of RT indexes, or NIL */
-	List	   *resultRelations;
-
 	/* RT indexes of result relations targeted by INSERT/UPDATE/DELETE/MERGE */
 	Bitmapset  *resultRelationRelids;
 
-- 
2.43.0



  [text/x-patch] v48-0003-Thread-flags-through-begin-scan-APIs.patch (37.1K, 4-v48-0003-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 05cc37abae70327fda4bee4a392dfebcc08ec3c5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v48 3/6] Thread flags through begin-scan APIs

Add an AM user-settable flags parameter to several of the table
scan functions, one table AM callback, and index_beginscan(). This
allows users to pass additional context to be used when building the
scan descriptors.

For index scans, a new flags field is added to IndexFetchTableData, and
the heap AM saves the caller-provided flags there.

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: David Rowley <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |   2 +-
 src/backend/access/brin/brin.c            |   3 +-
 src/backend/access/gin/gininsert.c        |   3 +-
 src/backend/access/heap/heapam_handler.c  |   9 +-
 src/backend/access/index/genam.c          |   6 +-
 src/backend/access/index/indexam.c        |  13 ++-
 src/backend/access/nbtree/nbtsort.c       |   3 +-
 src/backend/access/table/tableam.c        |  22 ++---
 src/backend/commands/constraint.c         |   3 +-
 src/backend/commands/copyto.c             |   3 +-
 src/backend/commands/tablecmds.c          |  13 +--
 src/backend/commands/typecmds.c           |   6 +-
 src/backend/executor/execIndexing.c       |   4 +-
 src/backend/executor/execReplication.c    |  12 ++-
 src/backend/executor/nodeBitmapHeapscan.c |   3 +-
 src/backend/executor/nodeIndexonlyscan.c  |   9 +-
 src/backend/executor/nodeIndexscan.c      |  12 ++-
 src/backend/executor/nodeSamplescan.c     |   3 +-
 src/backend/executor/nodeSeqscan.c        |   9 +-
 src/backend/executor/nodeTidrangescan.c   |   7 +-
 src/backend/partitioning/partbounds.c     |   3 +-
 src/backend/utils/adt/selfuncs.c          |   3 +-
 src/include/access/genam.h                |   6 +-
 src/include/access/heapam.h               |   5 +-
 src/include/access/relscan.h              |   6 ++
 src/include/access/tableam.h              | 103 ++++++++++++++++------
 26 files changed, 185 insertions(+), 86 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..d164c4c03ad 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, SO_NONE);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..bdb30752e09 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									SO_NONE);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..9d83a495775 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									SO_NONE);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d40878928e1..99280cd8159 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -81,11 +81,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -763,7 +764,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0,
+									SO_NONE);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -772,7 +774,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL,
+									SO_NONE);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1408989c568 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,8 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0,
+										 SO_NONE);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +717,8 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0,
+									 SO_NONE);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index fbfc33159eb..44496ae0963 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -258,7 +258,8 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys,
+				uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -285,7 +286,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -588,13 +589,17 @@ index_parallelrescan(IndexScanDesc scan)
 /*
  * index_beginscan_parallel - join parallel index scan
  *
+ * flags is a bitmask of ScanOptions affecting the underlying table scan. No
+ * SO_INTERNAL_FLAGS are permitted.
+ *
  * Caller must be holding suitable locks on the heap and the index.
  */
 IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan,
+						 uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -616,7 +621,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..756dfa3dcf4 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,8 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared),
+									SO_NONE);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..86481d7c029 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, SO_NONE);
 }
 
 
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan,
+						 uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +177,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +186,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +208,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +217,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +250,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, SO_NONE);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..421d8c359f0 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,8 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation,
+															SO_NONE);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..f0e0147c665 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1336,7 +1336,8 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL,
+							   SO_NONE);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c69c12dc014..ec0063287d0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,8 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL,
+							   SO_NONE);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,8 +13981,8 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
-
+	scan = table_beginscan(rel, snapshot, 0, NULL,
+						   SO_NONE);
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
 									  ALLOCSET_SMALL_SIZES);
@@ -22882,7 +22883,8 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL,
+							   SO_NONE);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23346,7 +23348,8 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL,
+						   SO_NONE);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..cd38e9cddf4 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,8 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL,
+							   SO_NONE);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3267,8 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL,
+							   SO_NONE);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cc6eb3a6ee9 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,9 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index,
+								 &DirtySnapshot, NULL, indnkeyatts, 0,
+								 SO_NONE);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..fea8991cb04 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,8 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   &snap, NULL, skey_attoff, 0, SO_NONE);
 
 retry:
 	found = false;
@@ -383,7 +384,8 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL,
+						   SO_NONE);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +604,8 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL,
+						   SO_NONE);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +669,8 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   SnapshotAny, NULL, skey_attoff, 0, SO_NONE);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..69683d81527 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,8 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   SO_NONE);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 9eab81fd1c8..02df40f32c5 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -95,7 +95,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys,
+								   SO_NONE);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -794,7 +795,8 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -860,7 +862,8 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 06143e94c5a..3c0b8daf664 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -113,7 +113,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -209,7 +210,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1730,7 +1732,8 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1794,7 +1797,8 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cf32df33d82 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,8 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode,
+									 SO_NONE);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..09ccc65de1c 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,8 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL,
+								   SO_NONE);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +376,8 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +410,6 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 SO_NONE);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..084e4c6ec90 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,8 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid,
+												SO_NONE);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +461,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +495,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, SO_NONE);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..f867d1b75a5 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,8 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL,
+							   SO_NONE);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53f85ccde01..4160d2d6e24 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7178,7 +7178,8 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0,
+								 SO_NONE);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..b69320a7fc8 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -158,7 +158,8 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys,
+									 uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -184,7 +185,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9b403203006..e2e07348f37 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..960abf6c214 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Bitmask of ScanOptions affecting the relation. No SO_INTERNAL_FLAGS are
+	 * permitted.
+	 */
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..f8d1423b2d0 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -45,6 +45,8 @@ typedef struct ValidateIndexState ValidateIndexState;
  */
 typedef enum ScanOptions
 {
+	SO_NONE = 0,
+
 	/* one of SO_TYPE_* may be specified */
 	SO_TYPE_SEQSCAN = 1 << 0,
 	SO_TYPE_BITMAPSCAN = 1 << 1,
@@ -65,6 +67,19 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table scan functions and
+ * shouldn't be passed by callers. Some of these are effectively set by callers
+ * through parameters to table scan functions (e.g. SO_ALLOW_STRAT/allow_strat),
+ * however, for now, retain tight control over them and don't allow users to
+ * pass these themselves to table scan functions.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -321,8 +336,9 @@ typedef struct TableAmRoutine
 	 * `flags` is a bitmask indicating the type of scan (ScanOptions's
 	 * SO_TYPE_*, currently only one may be specified), options controlling
 	 * the scan's behaviour (ScanOptions's SO_ALLOW_*, several may be
-	 * specified, an AM may ignore unsupported ones) and whether the snapshot
-	 * needs to be deallocated at scan_end (ScanOptions's SO_TEMP_SNAPSHOT).
+	 * specified, an AM may ignore unsupported ones), whether the snapshot
+	 * needs to be deallocated at scan_end (ScanOptions's SO_TEMP_SNAPSHOT),
+	 * and any number of the other ScanOptions values.
 	 */
 	TableScanDesc (*scan_begin) (Relation rel,
 								 Snapshot snapshot,
@@ -418,9 +434,12 @@ typedef struct TableAmRoutine
 	 * IndexFetchTableData, which the AM will typically embed in a larger
 	 * structure with additional information.
 	 *
+	 * flags is a bitmask of ScanOptions affecting underlying table scan
+	 * behavior. See scan_begin() for more information on passing these.
+	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +890,19 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	Assert((flags & ~SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -891,15 +917,18 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 /*
  * Start a scan of `rel`. Returned tuples pass a visibility test of
  * `snapshot`, and if nkeys != 0, the results are filtered by those scan keys.
+ *
+ * flags is a bitmask of ScanOptions. No SO_INTERNAL_FLAGS are permitted.
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +957,8 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  flags, SO_NONE);
 }
 
 /*
@@ -936,14 +966,17 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  * TableScanDesc for a bitmap heap scan.  Although that scan technology is
  * really quite unlike a standard seqscan, there is just enough commonality to
  * make it worth using the same data structure.
+ *
+ * flags is a bitmask of ScanOptions. No SO_INTERNAL_FLAGS are permitted.
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -952,23 +985,26 @@ table_beginscan_bm(Relation rel, Snapshot snapshot,
  * using the same data structure although the behavior is rather different.
  * In addition to the options offered by table_beginscan_strat, this call
  * also allows control of whether page-mode visibility checking is used.
+ *
+ * flags is a bitmask of ScanOptions. No SO_INTERNAL_FLAGS are permitted.
  */
 static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1017,8 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								  flags, SO_NONE);
 }
 
 /*
@@ -994,7 +1031,8 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL,
+								  flags, SO_NONE);
 }
 
 /*
@@ -1055,16 +1093,19 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 /*
  * table_beginscan_tidrange is the entry point for setting up a TableScanDesc
  * for a TID range scan.
+ *
+ * flags is a bitmask of ScanOptions. No SO_INTERNAL_FLAGS are permitted.
  */
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1136,20 +1177,26 @@ extern void table_parallelscan_initialize(Relation rel,
  * table_parallelscan_initialize(), for the same relation. The initialization
  * does not need to have happened in this backend.
  *
+ * flags is a bitmask of ScanOptions. No SO_INTERNAL_FLAGS are permitted.
+ *
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
  * with table_parallelscan_initialize(), for the same relation. The
  * initialization does not need to have happened in this backend.
  *
+ * flags is a bitmask of ScanOptions. No SO_INTERNAL_FLAGS are permitted.
+ *
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1172,11 +1219,15 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Prepare to fetch tuples from the relation, as needed when fetching tuples
  * for an index scan.
  *
+ * flags is a bitmask of ScanOptions. No SO_INTERNAL_FLAGS are permitted.
+ *
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1236,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v48-0004-Pass-down-information-on-table-modification-to-s.patch (10.0K, 5-v48-0004-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 239ec276e5bee0f59ae0a91d0bd9eff8842c8a63 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v48 4/6] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/execUtils.c          | 21 +++++++++++++++++++++
 src/backend/executor/nodeBitmapHeapscan.c |  3 ++-
 src/backend/executor/nodeIndexonlyscan.c  |  9 ++++++---
 src/backend/executor/nodeIndexscan.c      | 12 ++++++++----
 src/backend/executor/nodeSamplescan.c     |  3 ++-
 src/backend/executor/nodeSeqscan.c        | 10 +++++++---
 src/backend/executor/nodeTidrangescan.c   | 11 ++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           |  2 ++
 9 files changed, 59 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 36c5285d252..f090de49921 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -736,6 +736,27 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 	return bms_is_member(scanrelid, estate->es_plannedstmt->resultRelationRelids);
 }
 
+/*
+ * Return true if the scan node's relation is not modified by the query.
+ *
+ * This is not perfectly accurate. INSERT ... SELECT from the same table does
+ * not add the scan relation to resultRelationRelids, so it will be reported
+ * as read-only even though the query modifies it.
+ *
+ * Conversely, when any relation in the query has a modifying row mark, all
+ * other relations get a ROW_MARK_REFERENCE, causing them to be reported as
+ * not read-only even though they may only be read.
+ */
+bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	Index		scanrelid = ((Scan *) ss->ps.plan)->scanrelid;
+	PlannedStmt *pstmt = ss->ps.state->es_plannedstmt;
+
+	return !bms_is_member(scanrelid, pstmt->resultRelationRelids) &&
+		!bms_is_member(scanrelid, pstmt->rowMarkRelids);
+}
+
 /* ----------------------------------------------------------------
  *		ExecOpenScanRelation
  *
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 69683d81527..73831aed451 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -149,7 +149,8 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 							   node->ss.ps.state->es_snapshot,
 							   0,
 							   NULL,
-							   SO_NONE);
+							   ScanRelIsReadOnly(&node->ss) ?
+							   SO_HINT_REL_READ_ONLY : SO_NONE);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 02df40f32c5..de6154fd541 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -96,7 +96,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
 								   node->ioss_NumOrderByKeys,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -796,7 +797,8 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -863,7 +865,8 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 3c0b8daf664..1620d146071 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -114,7 +114,8 @@ IndexNext(IndexScanState *node)
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
 								   node->iss_NumOrderByKeys,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -211,7 +212,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
 								   node->iss_NumOrderByKeys,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1733,7 +1735,8 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1798,7 +1801,8 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cf32df33d82..f3d273e1c5e 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -299,7 +299,8 @@ tablesample_init(SampleScanState *scanstate)
 									 scanstate->use_bulkread,
 									 allow_sync,
 									 scanstate->use_pagemode,
-									 SO_NONE);
+									 ScanRelIsReadOnly(&scanstate->ss) ?
+									 SO_HINT_REL_READ_ONLY : SO_NONE);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 09ccc65de1c..04803b0e37d 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -72,7 +72,8 @@ SeqNext(SeqScanState *node)
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
 								   0, NULL,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,9 +376,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -411,5 +414,6 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 084e4c6ec90..4a8fe91b2b3 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -246,7 +246,8 @@ TidRangeNext(TidRangeScanState *node)
 												estate->es_snapshot,
 												&node->trss_mintid,
 												&node->trss_maxtid,
-												SO_NONE);
+												ScanRelIsReadOnly(&node->ss) ?
+												SO_HINT_REL_READ_ONLY : SO_NONE);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -461,7 +462,9 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, SO_NONE);
+										  pscan,
+										  ScanRelIsReadOnly(&node->ss) ?
+										  SO_HINT_REL_READ_ONLY : SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -495,5 +498,7 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, SO_NONE);
+										  pscan,
+										  ScanRelIsReadOnly(&node->ss) ?
+										  SO_HINT_REL_READ_ONLY : SO_NONE);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index f8d1423b2d0..68ddabc171a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..7979a17e4ec 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,8 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+extern bool ScanRelIsReadOnly(ScanState *ss);
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v48-0005-Allow-on-access-pruning-to-set-pages-all-visible.patch (9.9K, 6-v48-0005-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From e914b4834e613c59935df55a400a9290cc145b33 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v48 5/6] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they require pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++-
 src/backend/access/heap/pruneheap.c      | 55 ++++++++++++++++++------
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              |  3 +-
 5 files changed, 52 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index eb1f67f31cd..7012ee2c306 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 99280cd8159..3433ea93c11 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -149,7 +149,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2546,7 +2547,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6693af8da7f..7fcfc844d20 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -232,7 +234,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -251,9 +254,21 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
  * VM corruption during pruning, we will fix it. Caller is responsible for
  * unpinning *vmbuffer.
+ *
+ * rel_read_only is true if we determined at plan time that the query does not
+ * modify the relation. It is counterproductive to set the VM if the query
+ * will immediately clear it.
+ *
+ * As noted in ScanRelIsReadOnly(), INSERT ... SELECT on the same table will
+ * report the scan relation as read-only. This is usually harmless in
+ * practice. It is useful to set scanned pages all-visible that won't be
+ * inserted into. Pages we do insert to rarely meet the criteria for pruning,
+ * and those that do are likely to contain in-progress inserts which make the
+ * page not fully all-visible.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -336,6 +351,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -392,6 +409,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -461,9 +479,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -920,21 +937,35 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM on-access for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1167,7 +1198,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f698c2d899b..24001b27387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e2e07348f37..f2a009141be 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -431,7 +432,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v48-0006-Set-pd_prune_xid-on-insert.patch (8.6K, 7-v48-0006-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 13f3c314d760bce33ca48ea6d1cde606b62cad4c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v48 6/6] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that on-access pruning can update the visibility map (VM) during
read-only queries, set the page’s pd_prune_xid hint during INSERT and on
the new page during UPDATE.

This allows heap_page_prune_and_freeze() to set the VM the first time a
page is read after being filled with tuples. This may avoid I/O
amplification by setting the page all-visible when it is still in shared
buffers and allowing later vacuums to skip scanning the page. It also
enables index-only scans of newly inserted data much sooner.

As a side benefit, this addresses a long-standing note in heap_insert()
and heap_multi_insert(): aborted inserts can now be pruned on-access
rather than lingering until the next VACUUM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 39 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++-------
 3 files changed, 51 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7012ee2c306..3b020d910d7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2154,6 +2154,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2180,6 +2181,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2203,25 +2206,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2231,7 +2239,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2596,8 +2603,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM and we're not in bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4139,12 +4150,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7fcfc844d20..fe9b1f16db4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -286,7 +286,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1927,17 +1928,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-31 02:16  David Rowley <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: David Rowley @ 2026-03-31 02:16 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, 30 Mar 2026 at 06:16, Melanie Plageman
<[email protected]> wrote:
> Attached v48 does a bit more cleanup. No functional changes. I'm
> planning to push this soon. I think my remaining question is whether I
> should move the row marks and result relation bitmaps into the estate.
> I'm leaning toward not doing that and leaving them in the PlannedStmt.
> Anyway, If I want to replace the list of result relation RTIs in the
> PlannedStmt, I have to leave the bitmapset version there.

I looked at v48-0001 and it looks fine to me. I've only minor quibbles
about you using foreach() instead of foreach_int() and foreach_node()
for populating the new Bitmapsets in standard_planner().

I don't see any advantage to adding the fields to EState. There might
be if there was some performance reason, but it looks like you're only
accessing the fields when scans are initialised. It's hard to imagine
an extra pointer deference would matter there. I didn't find any
guidance in any comments to understand if there's a best practise
here, so I assume what's there today is down to people's taste. For
me, I'd say if it's not performance critical and the executor does not
modify the field for any purpose, then keeping it in PlannedStmt is
fine. If someone thinks I'm wrong on that, then a comment at the top
of EState would be helpful.

David





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-31 16:19  Melanie Plageman <[email protected]>
  parent: David Rowley <[email protected]>
  0 siblings, 2 replies; 143+ messages in thread

From: Melanie Plageman @ 2026-03-31 16:19 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Thanks for the reply! I have committed the patches in this thread and
marked the CF entry accordingly.

On Mon, Mar 30, 2026 at 10:17 PM David Rowley <[email protected]> wrote:
>
> I looked at v48-0001 and it looks fine to me. I've only minor quibbles
> about you using foreach() instead of foreach_int() and foreach_node()
> for populating the new Bitmapsets in standard_planner().

Good point. I forgot about those. Attached patch fixes that (since the
code was already committed).

- Melanie


Attachments:

  [text/x-patch] v1-0001-Use-foreach_int-foreach_node-for-resultRelationRe.patch (2.1K, 2-v1-0001-Use-foreach_int-foreach_node-for-resultRelationRe.patch)
  download | inline diff:
From cd9ba7cce756ad870a00ce82faae41b5564980b7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 31 Mar 2026 12:06:39 -0400
Subject: [PATCH v1] Use foreach_int/foreach_node for resultRelationRelids and
 rowMarkRelids

0f4c170cf3b85e iterated through PlannerGlobal->resultRelations and
PlannerGlobal->finalrowmarks adding their RTIs to bitmapsets in the
PlannedStmt. It used the generic foreach() instead of the more recently
introduced, preferred, type-safe specialized variants: foreach_int() and
foreach_node(). Do that now.

Reported-by: David Rowley <[email protected]>
Discussion: https://postgr.es/m/CAApHDvq_R-gNXu%2B06GQW6w_HaEMh1pezsyiCh7GNhgh%2Bh0UqMw%40mail.gmail.com
---
 src/backend/optimizer/plan/planner.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 07944612668..2b8243635a9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -341,8 +341,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	Path	   *best_path;
 	Plan	   *top_plan;
 	ListCell   *lp,
-			   *lr,
-			   *lc;
+			   *lr;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -666,12 +665,12 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	 * Compute resultRelationRelids and rowMarkRelids from resultRelations and
 	 * rowMarks. These can be used for cheap membership checks.
 	 */
-	foreach(lc, glob->resultRelations)
+	foreach_int(rti, glob->resultRelations)
 		result->resultRelationRelids = bms_add_member(result->resultRelationRelids,
-													  lfirst_int(lc));
-	foreach(lc, glob->finalrowmarks)
+													  rti);
+	foreach_node(PlanRowMark, rowmark, glob->finalrowmarks)
 		result->rowMarkRelids = bms_add_member(result->rowMarkRelids,
-											   ((PlanRowMark *) lfirst(lc))->rti);
+											   rowmark->rti);
 
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-31 22:14  David Rowley <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: David Rowley @ 2026-03-31 22:14 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, 1 Apr 2026 at 05:19, Melanie Plageman <[email protected]> wrote:
>
> Thanks for the reply! I have committed the patches in this thread and
> marked the CF entry accordingly.

Yeah, realised that after sending the email.

> On Mon, Mar 30, 2026 at 10:17 PM David Rowley <[email protected]> wrote:
> >
> > I looked at v48-0001 and it looks fine to me. I've only minor quibbles
> > about you using foreach() instead of foreach_int() and foreach_node()
> > for populating the new Bitmapsets in standard_planner().
>
> Good point. I forgot about those. Attached patch fixes that (since the
> code was already committed).

Since it's in already, maybe it'd be worth doing something more
widespread after the freeze is over, changing just the ones new to
v19.

git diff 2652835d3efa003439ecc23d5fc3cf089c5952a6.. -- *.c | grep -E
"^\+\s+foreach\("

or with a bit more context:

git diff 2652835d3efa003439ecc23d5fc3cf089c5952a6.. -- *.c | grep -E
"(^\+\s+foreach\(|^---)"

The mixed node ones don't qualify, but it shouldn't be too hard to
filter those out manually.

David





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-03-31 22:55  Melanie Plageman <[email protected]>
  parent: David Rowley <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2026-03-31 22:55 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Tue, Mar 31, 2026 at 6:14 PM David Rowley <[email protected]> wrote:
>
> Since it's in already, maybe it'd be worth doing something more
> widespread after the freeze is over, changing just the ones new to
> v19.

That makes sense. I've put it on my todo list for post-freeze.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-03 05:00  Alexander Lakhin <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 2 replies; 143+ messages in thread

From: Alexander Lakhin @ 2026-04-03 05:00 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; Andres Freund <[email protected]>; +Cc: Tomas Vondra <[email protected]>; David Rowley <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hello Melanie and Andres,

31.03.2026 19:19, Melanie Plageman wrote:
> Thanks for the reply! I have committed the patches in this thread and
> marked the CF entry accordingly.

I've come across an interesting failure produced starting from 378a21618:
when using a build made with CFLAGS="-DRELCACHE_FORCE_RELEASE" and
echo "io_method = sync" >/tmp/temp.config, the test run:
TEMP_CONFIG=/tmp/temp.config TESTS=temp make -s check-tests

fails as below:
--- .../src/test/regress/expected/temp.out   2026-02-13 06:15:55.887368624 +0200
+++ .../src/test/regress/results/temp.out    2026-04-03 07:51:36.735504156 +0300
@@ -493,11 +493,7 @@

  -- Check that read streams deal with lower number of pins available
  SELECT count(*), max(a) max_a, min(a) min_a, max(cnt) max_cnt FROM test_temp;
- count | max_a | min_a | max_cnt
--------+-------+-------+---------
- 10000 | 10000 |     1 |       0
-(1 row)
-
+ERROR:  no empty local buffer available
  ROLLBACK;

Could you look please, if it indicates some regression?

Best regards,
Alexander





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-05 15:42  Melanie Plageman <[email protected]>
  parent: Alexander Lakhin <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-04-05 15:42 UTC (permalink / raw)
  To: Alexander Lakhin <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; David Rowley <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Fri, Apr 3, 2026 at 1:00 AM Alexander Lakhin <[email protected]> wrote:
>
> I've come across an interesting failure produced starting from 378a21618:
> when using a build made with CFLAGS="-DRELCACHE_FORCE_RELEASE" and
> echo "io_method = sync" >/tmp/temp.config, the test run:
> TEMP_CONFIG=/tmp/temp.config TESTS=temp make -s check-tests
>
> fails as below:
> --- .../src/test/regress/expected/temp.out   2026-02-13 06:15:55.887368624 +0200
> +++ .../src/test/regress/results/temp.out    2026-04-03 07:51:36.735504156 +0300
> @@ -493,11 +493,7 @@
>
>   -- Check that read streams deal with lower number of pins available
>   SELECT count(*), max(a) max_a, min(a) min_a, max(cnt) max_cnt FROM test_temp;
> - count | max_a | min_a | max_cnt
> --------+-------+-------+---------
> - 10000 | 10000 |     1 |       0
> -(1 row)
> -
> +ERROR:  no empty local buffer available
>   ROLLBACK;

It has to do with the query needing an additional pin for the VM
during on-access pruning and the read stream reading ahead until there
is only one remaining buffer pin in the local pin limit (the cursor
above is already consuming much of the backend local pin limit). We
could perhaps fix this test by decreasing the pages in the relation or
increasing the backend local pin limit, but I wonder if we need to do
something more invasive to ensure that we can pin at least two
buffers.

FWIW, I don't think this can be hit with the FSM, because we release
the heap buffer pin before pinning it. Though perhaps there are other
places in the code where scanning a single buffer with the read stream
entails pinning at least one other buffer.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-05 15:49  Andres Freund <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Andres Freund @ 2026-04-05 15:49 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; David Rowley <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hi,

On 2026-04-05 11:42:19 -0400, Melanie Plageman wrote:
> On Fri, Apr 3, 2026 at 1:00 AM Alexander Lakhin <[email protected]> wrote:
> >
> > I've come across an interesting failure produced starting from 378a21618:
> > when using a build made with CFLAGS="-DRELCACHE_FORCE_RELEASE" and
> > echo "io_method = sync" >/tmp/temp.config, the test run:
> > TEMP_CONFIG=/tmp/temp.config TESTS=temp make -s check-tests
> >
> > fails as below:
> > --- .../src/test/regress/expected/temp.out   2026-02-13 06:15:55.887368624 +0200
> > +++ .../src/test/regress/results/temp.out    2026-04-03 07:51:36.735504156 +0300
> > @@ -493,11 +493,7 @@
> >
> >   -- Check that read streams deal with lower number of pins available
> >   SELECT count(*), max(a) max_a, min(a) min_a, max(cnt) max_cnt FROM test_temp;
> > - count | max_a | min_a | max_cnt
> > --------+-------+-------+---------
> > - 10000 | 10000 |     1 |       0
> > -(1 row)
> > -
> > +ERROR:  no empty local buffer available
> >   ROLLBACK;
> 
> It has to do with the query needing an additional pin for the VM
> during on-access pruning and the read stream reading ahead until there
> is only one remaining buffer pin in the local pin limit (the cursor
> above is already consuming much of the backend local pin limit). We
> could perhaps fix this test by decreasing the pages in the relation or
> increasing the backend local pin limit, but I wonder if we need to do
> something more invasive to ensure that we can pin at least two
> buffers.

I think we should probably just have GetLocalPinLimit() return something
considerably smaller than num_temp_buffers, e.g. num_temp_buffers / 4 or
so.

There always may be more than one scan going on, so we can't ever promise that
there's at least a certain number of pins available. The main goal of the
shared pin limit is to prevent one backend from pinning disproportionally much
of s_b.  And for that eventually scaling down to just needing 1-2 pins per
scan is sufficient.

Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-06 14:30  Alexey Makhmutov <[email protected]>
  parent: Melanie Plageman <[email protected]>
  2 siblings, 1 reply; 143+ messages in thread

From: Alexey Makhmutov @ 2026-04-06 14:30 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; Robert Haas <[email protected]>; Heikki Linnakangas <[email protected]>

Hi Melanie,

Sorry for the late note for the already committed patch, but I have a 
question on the last part of the 'heap_xlog_prune_freeze' function 
related to the FSM update (it was committed in add323d -'Eliminate 
XLOG_HEAP2_VISIBLE from vacuum phase III').

Currently it contains the following logic:
	
...
Size	freespace = 0;
...
if (BufferIsValid(buffer))
{
	if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
						XLHP_HAS_DEAD_ITEMS |
						XLHP_HAS_NOW_UNUSED_ITEMS)) ||
		(vmflags & VISIBILITYMAP_VALID_BITS))
		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
...
	UnlockReleaseBuffer(buffer);
}
...
if (freespace > 0)
	XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
...
		
My question is about the last check ('freespace > 0') - do we really 
want to call 'XLogRecordPageWithFreeSpace' only if 'freespace' is 
greater than 0? As I understand, the zero value is a perfectly valid 
output of the 'PageGetHeapFreeSpace' call (i.e. page has no space or no 
free line items while we mark all rows as frozen/visible), but with the 
current implementation we will skip FSM update in such case. Maybe we 
need to use additional flag (i.e. 'need_fsm_update'), set it before 
calling 'PageGetHeapFreeSpace' and then check before the 
'XLogRecordPageWithFreeSpace' invocation?

Thanks,
Alexey





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-06 14:44  Andres Freund <[email protected]>
  parent: Alexey Makhmutov <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Andres Freund @ 2026-04-06 14:44 UTC (permalink / raw)
  To: Alexey Makhmutov <[email protected]>; +Cc: Melanie Plageman <[email protected]>; PostgreSQL Hackers <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; Robert Haas <[email protected]>; Heikki Linnakangas <[email protected]>

Hi,

On 2026-04-06 17:30:51 +0300, Alexey Makhmutov wrote:
> Sorry for the late note for the already committed patch, but I have a
> question on the last part of the 'heap_xlog_prune_freeze' function related
> to the FSM update (it was committed in add323d -'Eliminate
> XLOG_HEAP2_VISIBLE from vacuum phase III').

FWIW, I don't think it's ever too late to look at commits, and certainly not
when it's a commit from the same release.


> Currently it contains the following logic:
> 	
> ...
> Size	freespace = 0;
> ...
> if (BufferIsValid(buffer))
> {
> 	if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
> 						XLHP_HAS_DEAD_ITEMS |
> 						XLHP_HAS_NOW_UNUSED_ITEMS)) ||
> 		(vmflags & VISIBILITYMAP_VALID_BITS))
> 		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
> ...
> 	UnlockReleaseBuffer(buffer);
> }
> ...
> if (freespace > 0)
> 	XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
> ...
> 		
> My question is about the last check ('freespace > 0') - do we really want to
> call 'XLogRecordPageWithFreeSpace' only if 'freespace' is greater than 0? As
> I understand, the zero value is a perfectly valid output of the
> 'PageGetHeapFreeSpace' call (i.e. page has no space or no free line items
> while we mark all rows as frozen/visible), but with the current
> implementation we will skip FSM update in such case.

I don't have a strong opinion on this, but I think it's pretty defensible to
record only when there's free space. The whole goal of updating the FSM during
recovery is to make sure that free space can be found fairly quickly after
promotion (it's also beneficial in some crash recovery cases, but not that
much).

If the page filled up during an insert / update / delete, we will have updated
the FSM with that information at that point:

	/*
	 * If the page is running low on free space, update the FSM as well.
	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
	 * better than that without knowing the fill-factor for the table.
	 *
	 * XXX: Don't do this if the page was restored from full page image. We
	 * don't bother to update the FSM in that case, it doesn't need to be
	 * totally accurate anyway.
	 */
	if (action == BLK_NEEDS_REDO && freespace < BLCKSZ / 5)
		XLogRecordPageWithFreeSpace(target_locator, blkno, freespace);


The reason to update the FSM after something pruning / vacuuming related is
that there now might be *more* space available than before, it shouldn't
shrink.

Obviously the FSM is not crashsafe, so updating it with 0 during replay could
avoid some unnecessary page reads after a promotion. But I'm not sure that
that's particularly worth optimizing for.


With all that said, I'm somewhat doubtful that freespace > 0 filters out a
meaningful amount of freespace updates, it's rare for pages to be that full.

Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-06 15:32  Alexey Makhmutov <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Alexey Makhmutov @ 2026-04-06 15:32 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Melanie Plageman <[email protected]>; PostgreSQL Hackers <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; Robert Haas <[email protected]>; Heikki Linnakangas <[email protected]>

Hi Andres,

On 4/6/26 17:44, Andres Freund wrote:
> I don't have a strong opinion on this, but I think it's pretty defensible to
> record only when there's free space. The whole goal of updating the FSM during
> recovery is to make sure that free space can be found fairly quickly after
> promotion (it's also beneficial in some crash recovery cases, but not that
> much).
> ... 
> Obviously the FSM is not crashsafe, so updating it with 0 during replay could
> avoid some unnecessary page reads after a promotion. But I'm not sure that
> that's particularly worth optimizing for.

This makes sense. I'd like just to put some context here: I was checking 
the FSM update case in scope of the thread 
https://www.postgresql.org/message-id/flat/[email protected], 
in which I was specifically looking at the case with outdated FSM data 
(showing lots of free space) on standby causing a significant 
performance hit after switchover. As example this include case with 
table having fillfactor<=80 which has prior bulk rows deletes + 
insertion. In this case (mostly) empty FSM block may be delivered to 
standby via FPI, but subsequent inserts may be lost due to the 20% 
heuristic. Moreover, updates to FSM may be lost even for blocks filled 
for more than 80% due to missing dirty flag as described in that thread.

In my understanding the FSM update during processing of the 
'heap_xlog_visible' function on standby was kind of 'last line of 
defense' for any corner case scenario with FSM update (as block would 
not be visited by the vacuum process once it's marked as 'all visible') 
and it was introduced in 
https://www.postgresql.org/message-id/[email protected] 
(ab7dbd681) specifically for this purpose. Now, as logic of 
'heap_xlog_visible' is merged into 'heap_xlog_prune_freeze', so this 
task is carried by this function.

I fully agree that having exactly zero-space seems to be a very uncommon 
situation (and probably not reproducible with tables having 
fillfactor<=80). I've just noted that such case was processed by the old 
logic in the 'heap_xlog_visible', while current implementation in 
'heap_xlog_prune_freeze' skips it.

Thanks,
Alexey





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-13 16:17  Melanie Plageman <[email protected]>
  parent: Alexey Makhmutov <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-04-13 16:17 UTC (permalink / raw)
  To: Alexey Makhmutov <[email protected]>; +Cc: Andres Freund <[email protected]>; PostgreSQL Hackers <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; Robert Haas <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Apr 6, 2026 at 11:32 AM Alexey Makhmutov
<[email protected]> wrote:
>
> This makes sense. I'd like just to put some context here: I was checking
> the FSM update case in scope of the thread
> https://www.postgresql.org/message-id/flat/[email protected],
> in which I was specifically looking at the case with outdated FSM data
> (showing lots of free space) on standby causing a significant
> performance hit after switchover. As example this include case with
> table having fillfactor<=80 which has prior bulk rows deletes +
> insertion. In this case (mostly) empty FSM block may be delivered to
> standby via FPI, but subsequent inserts may be lost due to the 20%
> heuristic. Moreover, updates to FSM may be lost even for blocks filled
> for more than 80% due to missing dirty flag as described in that thread.
>
> In my understanding the FSM update during processing of the
> 'heap_xlog_visible' function on standby was kind of 'last line of
> defense' for any corner case scenario with FSM update (as block would
> not be visited by the vacuum process once it's marked as 'all visible')
> and it was introduced in
> https://www.postgresql.org/message-id/[email protected]
> (ab7dbd681) specifically for this purpose. Now, as logic of
> 'heap_xlog_visible' is merged into 'heap_xlog_prune_freeze', so this
> task is carried by this function.
>
> I fully agree that having exactly zero-space seems to be a very uncommon
> situation (and probably not reproducible with tables having
> fillfactor<=80). I've just noted that such case was processed by the old
> logic in the 'heap_xlog_visible', while current implementation in
> 'heap_xlog_prune_freeze' skips it.

The scenario causing inaccurate freespace maps after promotion is
technically possible though improbable. Moreover, I don't see a
downside to changing it. Patch I plan to commit is attached.

I don't quite understand why heap_xlog_insert() considers whether the
heap page was an FPI before updating the FSM though. I know we need
some heuristic to avoid doing it for every record, but the FPI
consideration doesn't make sense to me.

- Melanie


Attachments:

  [text/x-patch] 0001-Update-FSM-when-updating-VM-even-if-freespace-is-zer.patch (2.4K, 2-0001-Update-FSM-when-updating-VM-even-if-freespace-is-zer.patch)
  download | inline diff:
From eb9403676d61c32f45139ef6559366182afd608f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 13 Apr 2026 11:50:32 -0400
Subject: [PATCH] Update FSM when updating VM even if freespace is zero

add323da40a started updating the visibility map in the same WAL record
as pruning and freezing. This included updating the freespace map during
replay, which we've done since ab7dbd681.

add323da40a, however, conditioned doing so on there being > 0 freespace,
which differed from the previous state. The FSM is not WAL-logged and is
instead updated heuristically on standbys. In rare cases, if the FSM is
not updated while replaying inserts, there is 0 freespace, and vacuum
replay doesn't update the FSM when setting the page
all-visible/all-frozen, after the standby is promoted and runs vacuum,
it may skip those pages and then propagate overly optimistic numbers up
the FSM, causing slowness when searching for freespace for new tuples.

Fix it by always updating the FSM when replaying setting the VM.

Author: Melanie Plageman <[email protected]>
Reported-by: Alexey Makhmutov <[email protected]>
Discussion: https://postgr.es/m/ead2f110-c736-48f5-99e1-023dc9acbf0b%40postgrespro.ru
---
 src/backend/access/heap/heapam_xlog.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index f3f419d3dc1..9ed7024e814 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -38,6 +38,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		vmbuffer = InvalidBuffer;
 	uint8		vmflags = 0;
 	Size		freespace = 0;
+	bool		do_update_fsm = false;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -211,7 +212,10 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 							XLHP_HAS_DEAD_ITEMS |
 							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
 			(vmflags & VISIBILITYMAP_VALID_BITS))
+		{
 			freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+			do_update_fsm = true;
+		}
 
 		/*
 		 * We want to avoid holding an exclusive lock on the heap buffer while
@@ -248,7 +252,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	if (BufferIsValid(vmbuffer))
 		UnlockReleaseBuffer(vmbuffer);
 
-	if (freespace > 0)
+	if (do_update_fsm)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-16 16:19  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2026-04-16 16:19 UTC (permalink / raw)
  To: Alexey Makhmutov <[email protected]>; +Cc: Andres Freund <[email protected]>; PostgreSQL Hackers <[email protected]>; Kirill Reshke <[email protected]>; Andrey Borodin <[email protected]>; Robert Haas <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Apr 13, 2026 at 12:17 PM Melanie Plageman
<[email protected]> wrote:
>
> The scenario causing inaccurate freespace maps after promotion is
> technically possible though improbable. Moreover, I don't see a
> downside to changing it. Patch I plan to commit is attached.

I've committed this fix in b4c1b2be300.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-16 20:21  Melanie Plageman <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-04-16 20:21 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; David Rowley <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Sun, Apr 5, 2026 at 11:49 AM Andres Freund <[email protected]> wrote:
>
> I think we should probably just have GetLocalPinLimit() return something
> considerably smaller than num_temp_buffers, e.g. num_temp_buffers / 4 or
> so.

I think num_temp_buffers / 4 seems reasonable for GetLocalPinLimit().
We'd also need to make this change in GetAdditionalLocalPinLimit().

Making this change fixes the specific case Alexander pointed out.

We will likely see an impact on performance impact because this will
keep the readahead distance substantially lower for temp table cases
with only one read stream.

> There always may be more than one scan going on, so we can't ever promise that
> there's at least a certain number of pins available. The main goal of the
> shared pin limit is to prevent one backend from pinning disproportionally much
> of s_b.  And for that eventually scaling down to just needing 1-2 pins per
> scan is sufficient.

With the last sentence "And for that eventually scaling down to just
needing 1-2 pins per scan is sufficient." -- how do you mean to relate
that to what we will do with local buffers case?

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-18 15:00  Alexander Lakhin <[email protected]>
  parent: Alexander Lakhin <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Alexander Lakhin @ 2026-04-18 15:00 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; Andres Freund <[email protected]>; +Cc: Tomas Vondra <[email protected]>; David Rowley <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hello Melanie and Andres,

03.04.2026 08:00, Alexander Lakhin wrote:
>
> 31.03.2026 19:19, Melanie Plageman wrote:
>> Thanks for the reply! I have committed the patches in this thread and
>> marked the CF entry accordingly.
>
> I've come across an interesting failure produced starting from 378a21618:
> ...

I've discovered one more behaviour change introduced in 378a21618. I
investigated a yesterday's skink failure [1]:
# --- /home/bf/bf-build/skink-master/HEAD/pgsql/contrib/btree_gist/expected/enum.out 2025-06-23 20:17:56.295775456 +0200
# +++ /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/btree_gist/regress/results/enum.out 2026-04-17 
22:35:37.212061309 +0200
# @@ -83,12 +83,10 @@
#
#  EXPLAIN (COSTS OFF)
#  SELECT count(*) FROM enumtmp WHERE a >= 'g'::rainbow;
# -                  QUERY PLAN
# ------------------------------------------------
# +                   QUERY PLAN
# +------------------------------------------------
#   Aggregate
# -   ->  Bitmap Heap Scan on enumtmp
# -         Recheck Cond: (a >= 'g'::rainbow)
# -         ->  Bitmap Index Scan on enumidx
# -               Index Cond: (a >= 'g'::rainbow)
# -(5 rows)
# +   ->  Index Only Scan using enumidx on enumtmp
# +         Index Cond: (a >= 'g'::rainbow)
# +(3 rows)
#
# 1 of 32 tests failed.

pgsql.build/testrun/btree_gist/regress/log/postmaster.log contains
2026-04-17 22:35:36.909 CEST autovacuum worker[4020330] LOG: automatic analyze of table 
"regression_btree_gist.public.enumtmp"
     avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
     buffer usage: 128 hits, 0 reads, 0 dirtied
     WAL usage: 2 records, 0 full page images, 322 bytes, 0 full page image bytes, 0 buffers full
     system usage: CPU: user: 0.05 s, system: 0.00 s, elapsed: 0.11 s

and managed to reproduce it locally under Valgrind on a slowed down VM so
that the enum test takes ~10 sec: With
+select c.relname,c.relpages,c.reltuples,s.autovacuum_count,s.autoanalyze_count
+from pg_class c
+left join pg_stat_all_tables s on c.oid = s.relid
+where c.relname in ('enumtmp', 'enumidx');
added to the test for diagnostics and the test repeated 100 times, I got:
...
ok 46        - enum                                    10635 ms
ok 47        - enum                                    10559 ms
# diff -U3 /home/vagrant/postgres/contrib/btree_gist/expected/enum.out 
/home/vagrant/postgres/contrib/btree_gist/results/enum.out
# --- /home/vagrant/postgres/contrib/btree_gist/expected/enum.out 2026-04-18 11:41:17.224063241 +0000
# +++ /home/vagrant/postgres/contrib/btree_gist/results/enum.out 2026-04-18 11:52:43.870049782 +0000
# @@ -91,18 +91,16 @@
#  where c.relname in ('enumtmp', 'enumidx');
#   relname | relpages | reltuples | autovacuum_count | autoanalyze_count
# ---------+----------+-----------+------------------+-------------------
# - enumtmp |        3 |       595 |                0 |                 0
# + enumtmp |        3 |       595 |                0 |                 1
#   enumidx |        4 |       595 | |
#  (2 rows)
#
#  EXPLAIN (COSTS OFF)
#  SELECT count(*) FROM enumtmp WHERE a >= 'g'::rainbow;
# -                  QUERY PLAN
# ------------------------------------------------
# +                   QUERY PLAN
# +------------------------------------------------
#   Aggregate
# -   ->  Bitmap Heap Scan on enumtmp
# -         Recheck Cond: (a >= 'g'::rainbow)
# -         ->  Bitmap Index Scan on enumidx
# -               Index Cond: (a >= 'g'::rainbow)
# -(5 rows)
# +   ->  Index Only Scan using enumidx on enumtmp
# +         Index Cond: (a >= 'g'::rainbow)
# +(3 rows)
#
not ok 48    - enum                                    10596 ms
ok 49        - enum                                    11693 ms
ok 50        - enum                                    11098 ms
...
# 6 of 131 tests failed.

I could also reproduce the same diff with just:
--- a/contrib/btree_gist/sql/enum.sql
+++ b/contrib/btree_gist/sql/enum.sql
@@ -40,2 +40,3 @@ SELECT count(*) FROM enumtmp WHERE a > 'g'::rainbow;

+ANALYZE enumtmp;
  EXPLAIN (COSTS OFF)

It's not reproduced at 378a21618~1, though.

Could you please look if this can be fixed?

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2026-04-17%2019%3A10%3A50

Best regards,
Alexander





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-18 16:25  Andres Freund <[email protected]>
  parent: Alexander Lakhin <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Andres Freund @ 2026-04-18 16:25 UTC (permalink / raw)
  To: Alexander Lakhin <[email protected]>; +Cc: Melanie Plageman <[email protected]>; Tomas Vondra <[email protected]>; David Rowley <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hi,

On 2026-04-18 18:00:00 +0300, Alexander Lakhin wrote:
> Hello Melanie and Andres,
> 
> 03.04.2026 08:00, Alexander Lakhin wrote:
> > 
> > 31.03.2026 19:19, Melanie Plageman wrote:
> > > Thanks for the reply! I have committed the patches in this thread and
> > > marked the CF entry accordingly.
> > 
> > I've come across an interesting failure produced starting from 378a21618:
> > ...
> 
> I've discovered one more behaviour change introduced in 378a21618. I
> investigated a yesterday's skink failure [1]:
> # --- /home/bf/bf-build/skink-master/HEAD/pgsql/contrib/btree_gist/expected/enum.out 2025-06-23 20:17:56.295775456 +0200
> # +++ /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/btree_gist/regress/results/enum.out
> 2026-04-17 22:35:37.212061309 +0200
> # @@ -83,12 +83,10 @@
> #
> #  EXPLAIN (COSTS OFF)
> #  SELECT count(*) FROM enumtmp WHERE a >= 'g'::rainbow;

Random: I wonder if the author if this intended this to be a temp table, based
on the name? That'd prevent any concurrent autovacuums/analyzes from changing
anything.


> # --- /home/vagrant/postgres/contrib/btree_gist/expected/enum.out 2026-04-18 11:41:17.224063241 +0000
> # +++ /home/vagrant/postgres/contrib/btree_gist/results/enum.out 2026-04-18 11:52:43.870049782 +0000
> # @@ -91,18 +91,16 @@
> #  where c.relname in ('enumtmp', 'enumidx');
> #   relname | relpages | reltuples | autovacuum_count | autoanalyze_count
> # ---------+----------+-----------+------------------+-------------------
> # - enumtmp |        3 |       595 |                0 |                 0
> # + enumtmp |        3 |       595 |                0 |                 1
> #   enumidx |        4 |       595 | |
> #  (2 rows)
> #
> #  EXPLAIN (COSTS OFF)
> #  SELECT count(*) FROM enumtmp WHERE a >= 'g'::rainbow;
> # -                  QUERY PLAN
> # ------------------------------------------------
> # +                   QUERY PLAN
> # +------------------------------------------------
> #   Aggregate
> # -   ->  Bitmap Heap Scan on enumtmp
> # -         Recheck Cond: (a >= 'g'::rainbow)
> # -         ->  Bitmap Index Scan on enumidx
> # -               Index Cond: (a >= 'g'::rainbow)
> # -(5 rows)
> # +   ->  Index Only Scan using enumidx on enumtmp
> # +         Index Cond: (a >= 'g'::rainbow)
> # +(3 rows)
> #
> not ok 48    - enum                                    10596 ms

The interesting column to show here would presumably be relallvisible.

What I assume is happening is that occasionally analyze now sees enough all
visible pages (due to on-access pruning marking the pages all visible) to
consider the index only scan worthwhile, whereas before that wasn't (or only
very rarely) happened.

Maybe I'm daft, but what would prevent this from happening before? The path
for it would be a bit more complicated, you'd have to have an autovacuum
instead of just an analyze - but that seems possible. It might require running
against a pre-existing install to be likely enough.


> It's not reproduced at 378a21618~1, though.
> 
> Could you please look if this can be fixed?

When you say fix, I assume you mean address the test instability, rather than
actual code changes?

Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-18 16:33  Andres Freund <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Andres Freund @ 2026-04-18 16:33 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; David Rowley <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hi,

On 2026-04-16 16:21:35 -0400, Melanie Plageman wrote:
> On Sun, Apr 5, 2026 at 11:49 AM Andres Freund <[email protected]> wrote:
> >
> > I think we should probably just have GetLocalPinLimit() return something
> > considerably smaller than num_temp_buffers, e.g. num_temp_buffers / 4 or
> > so.
> 
> I think num_temp_buffers / 4 seems reasonable for GetLocalPinLimit().
> We'd also need to make this change in GetAdditionalLocalPinLimit().

Right.


> We will likely see an impact on performance impact because this will
> keep the readahead distance substantially lower for temp table cases
> with only one read stream.

I don't think it's likely to be practically relevant. Unless you use a toy
sized temp_buffers - in which case readahead won't be a relevant performance
bottleneck - the pinned limit will often even be higher than with shared
buffers (due to the shared buffers pin limit being divided by
MaxBackends). Even at the default temp_buffers=1024/8MB, the limit would be
256. That's quite high for a single scan in such a tiny pool.


> > There always may be more than one scan going on, so we can't ever promise that
> > there's at least a certain number of pins available. The main goal of the
> > shared pin limit is to prevent one backend from pinning disproportionally much
> > of s_b.  And for that eventually scaling down to just needing 1-2 pins per
> > scan is sufficient.
> 
> With the last sentence "And for that eventually scaling down to just
> needing 1-2 pins per scan is sufficient." -- how do you mean to relate
> that to what we will do with local buffers case?

Just explaining why we shouldn't see these limits as hard limits that may
never be exceeded, but as caps that make problems less likely.

I guess I could see an argument for doing something more complicated for temp
buffers than num_temp_buffers / 4, e.g.
  Min(1, (num_temp_buffers - NLocalPinnedBuffers) / 4)
so that we get more conservative the more scans are concurrently in progress.

But I'd not go there right now, that seems like a more complicated project
(and we'd presumably want to do something roughly similar for the s_b case).

Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-18 18:00  Alexander Lakhin <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Alexander Lakhin @ 2026-04-18 18:00 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Melanie Plageman <[email protected]>; Tomas Vondra <[email protected]>; David Rowley <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

18.04.2026 19:25, Andres Freund wrote:
> The interesting column to show here would presumably be relallvisible.
>
> What I assume is happening is that occasionally analyze now sees enough all
> visible pages (due to on-access pruning marking the pages all visible) to
> consider the index only scan worthwhile, whereas before that wasn't (or only
> very rarely) happened.

Indeed, with c.relallvisible added, I can see:
--- .../contrib/btree_gist/expected/enum.out        2026-04-18 19:37:51.041565543 +0300
+++ .../contrib/btree_gist/results/enum.out 2026-04-18 19:40:59.077264981 +0300
@@ -88,18 +88,16 @@
  where c.relname in ('enumtmp', 'enumidx');
   relname | relpages | reltuples | autovacuum_count | autoanalyze_count | relallvisible
  ---------+----------+-----------+------------------+-------------------+---------------
- enumtmp |        3 |       595 |                0 |                 0 |             0
+ enumtmp |        3 |       595 |                0 |                 0 |             2
   enumidx |        4 |       595 | |                   |             0
  (2 rows)

  EXPLAIN (COSTS OFF)
  SELECT count(*) FROM enumtmp WHERE a >= 'g'::rainbow;
-                  QUERY PLAN
------------------------------------------------
+                   QUERY PLAN
+------------------------------------------------
   Aggregate
-   ->  Bitmap Heap Scan on enumtmp
-         Recheck Cond: (a >= 'g'::rainbow)
-         ->  Bitmap Index Scan on enumidx
-               Index Cond: (a >= 'g'::rainbow)
-(5 rows)
+   ->  Index Only Scan using enumidx on enumtmp
+         Index Cond: (a >= 'g'::rainbow)
+(3 rows)

At 378a21618~1, it stays zero.

> Maybe I'm daft, but what would prevent this from happening before? The path
> for it would be a bit more complicated, you'd have to have an autovacuum
> instead of just an analyze - but that seems possible. It might require running
> against a pre-existing install to be likely enough.

Yes, with VACUUM enumtmp; instead of ANALYZE enumtmp; the plan change is
reproduced at 378a21618~1:
@@ -88,18 +88,16 @@
  where c.relname in ('enumtmp', 'enumidx');
   relname | relpages | reltuples | autovacuum_count | autoanalyze_count | relallvisible
  ---------+----------+-----------+------------------+-------------------+---------------
- enumtmp |        3 |       595 |                0 |                 0 |             0
+ enumtmp |        3 |       595 |                0 |                 0 |             3
   enumidx |        4 |       595 | |                   |             0
  (2 rows)

  EXPLAIN (COSTS OFF)
  SELECT count(*) FROM enumtmp WHERE a >= 'g'::rainbow;
-                  QUERY PLAN
------------------------------------------------
+                   QUERY PLAN
+------------------------------------------------
   Aggregate
-   ->  Bitmap Heap Scan on enumtmp
-         Recheck Cond: (a >= 'g'::rainbow)
-         ->  Bitmap Index Scan on enumidx
-               Index Cond: (a >= 'g'::rainbow)
-(5 rows)
+   ->  Index Only Scan using enumidx on enumtmp
+         Index Cond: (a >= 'g'::rainbow)
+(3 rows)

And this diff is produced even at f7946a92 (from 2017-03-21), which added
the test case.

So, given that this is the only failure of btree_gist in two last years
at least, it looks like the probability of vacuuming the table there is
much lower than of analyzing.

>> Could you please look if this can be fixed?
> When you say fix, I assume you mean address the test instability, rather than
> actual code changes?

Sure, I didn't mean the new behavior is wrong. Probably changing that
table to temporary would work, but I wonder if there are other queries,
which plans can change due to the same reason.

Best regards,
Alexander





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-20 15:48  Melanie Plageman <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-04-20 15:48 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; David Rowley <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Sat, Apr 18, 2026 at 12:33 PM Andres Freund <[email protected]> wrote:
>
> I guess I could see an argument for doing something more complicated for temp
> buffers than num_temp_buffers / 4, e.g.
>   Min(1, (num_temp_buffers - NLocalPinnedBuffers) / 4)
> so that we get more conservative the more scans are concurrently in progress.
>
> But I'd not go there right now, that seems like a more complicated project
> (and we'd presumably want to do something roughly similar for the s_b case).

With shared buffers, while it is true you'd ideally leave the backend
headroom for other read streams etc, it won't error out the way the
temp table case does unless we've actually pinned all shared buffers.
It will simply slow down the read ahead of the competing read streams.

Attached is what I'm thinking of committing.

- Melanie


Attachments:

  [text/x-patch] Make-local-buffers-pin-limit-more-conservative.patch (2.4K, 2-Make-local-buffers-pin-limit-more-conservative.patch)
  download | inline diff:
From 7517a498cb1cbb23f60189019b3a46b27db35f72 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 20 Apr 2026 11:14:31 -0400
Subject: [PATCH] Make local buffers pin limit more conservative
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

GetLocalPinLimit() and GetAdditionalLocalPinLimit(), currently in use
only by the read stream, previously allowed a backend to pin all
num_temp_buffers local buffers. This meant that the read stream could
use every available local buffer for read-ahead, leaving none for other
concurrent pin-holders like other read streams and related buffers like
the visibility map buffer needed during on-access pruning.

Cap the local pin limit to num_temp_buffers / 4, providing some
headroom. This doesn't guarantee that all needed pins will be available
— for example, a backend can still open more cursors than there are
buffers — but it makes it less likely that read-ahead will exhaust the
pool.

Note that these functions are not limited by definition to use in the
read stream; however, this cap should be appropriate in other contexts.

Author: Melanie Plageman <[email protected]>
Reported-by: Alexander Lakhin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/97529f5a-ec10-46b1-ab50-4653126c6889%40gmail.com
---
 src/backend/storage/buffer/localbuf.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 396da84b25c..24ef95ecb10 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -307,16 +307,25 @@ GetLocalVictimBuffer(void)
 uint32
 GetLocalPinLimit(void)
 {
-	/* Every backend has its own temporary buffers, and can pin them all. */
-	return num_temp_buffers;
+	/*
+	 * Every backend has its own temporary buffers, but we leave headroom to
+	 * avoid running out of pins for discretionary demand, such as for
+	 * read-ahead.
+	 */
+	return num_temp_buffers / 4;
 }
 
 /* see GetAdditionalPinLimit() */
 uint32
 GetAdditionalLocalPinLimit(void)
 {
+	uint32		total = GetLocalPinLimit();
+
 	Assert(NLocalPinnedBuffers <= num_temp_buffers);
-	return num_temp_buffers - NLocalPinnedBuffers;
+
+	if (NLocalPinnedBuffers >= total)
+		return 0;
+	return total - NLocalPinnedBuffers;
 }
 
 /* see LimitAdditionalPins() */
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-20 16:18  Melanie Plageman <[email protected]>
  parent: Alexander Lakhin <[email protected]>
  0 siblings, 2 replies; 143+ messages in thread

From: Melanie Plageman @ 2026-04-20 16:18 UTC (permalink / raw)
  To: Alexander Lakhin <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; David Rowley <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Sat, Apr 18, 2026 at 2:00 PM Alexander Lakhin <[email protected]> wrote:
>
> 18.04.2026 19:25, Andres Freund wrote:
>
> >> Could you please look if this can be fixed?
> > When you say fix, I assume you mean address the test instability, rather than
> > actual code changes?
>
> Sure, I didn't mean the new behavior is wrong. Probably changing that
> table to temporary would work

Yes, I think changing it to a temp table is the easiest fix. We could
also do autovacuum_enabled=false, I think, but making it a temp table
seems cleanest.

I wonder if we should move the EXPLAIN test above the results queries,
then throw in a vacuum in between some of them so we exercise btree
gist as a bitmap heap scan and as an index only scan. It could provide
a little bit more coverage? Or maybe that isn't actually extra
coverage. I'm not sure.

> but I wonder if there are other queries,
> which plans can change due to the same reason.

I think we'll have to take this on a case-by-case basis when we see
failures. While it is certainly possible other tests just rely on
autovacuum not having run and set the page all-visible, many of them
probably have already had to account for that.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-20 19:00  Alexander Lakhin <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 143+ messages in thread

From: Alexander Lakhin @ 2026-04-20 19:00 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; Andres Freund <[email protected]>; +Cc: Tomas Vondra <[email protected]>; David Rowley <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hello Melanie and Andres,

20.04.2026 19:18, Melanie Plageman wrote:
>> but I wonder if there are other queries,
>> which plans can change due to the same reason.
> I think we'll have to take this on a case-by-case basis when we see
> failures. While it is certainly possible other tests just rely on
> autovacuum not having run and set the page all-visible, many of them
> probably have already had to account for that.

Thank you for paying attention to this!

I think, I found another test which suffers from autoanalyze with the new
behavior: [1], [2].

Initially I reproduced this diff on a slow armv7 device after many
iterations of `make check` with:
autovacuum_naptime = 1
autovacuum_analyze_threshold = 1
debug_parallel_query = 'regress'

But now I see that it can be reproduced on an ordinary machine with just:
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -208,2 +208,3 @@ execute test_mode_pp(1); -- 2x
  execute test_mode_pp(1); -- 3x
+analyze test_mode;
  execute test_mode_pp(1); -- 4x
(and expected/plancache.out updated)

and `make check` running in a loop. It failed for me on iterations 5, 4,
10 (as far as I can see, analyze updates relallvisible not every time):
# parallel group (18 tests):  prepare xml conversion plancache limit returning copy2 polymorphism sequence rowtypes 
largeobject temp rangefuncs with truncate domain plpgsql alter_table
# diff -U3 .../src/test/regress/expected/plancache.out .../src/test/regress/results/plancache.out
# --- .../src/test/regress/expected/plancache.out    2026-04-20 21:35:30.677775398 +0300
# +++ .../src/test/regress/results/plancache.out     2026-04-20 21:43:49.324492302 +0300
# @@ -374,11 +374,11 @@
#
#  -- we should now get a really bad plan
#  explain (costs off) execute test_mode_pp(2);
# -         QUERY PLAN
# ------------------------------
# +                        QUERY PLAN
# +----------------------------------------------------------
#   Aggregate
# -   ->  Seq Scan on test_mode
# -         Filter: (a = $1)
# +   ->  Index Only Scan using test_mode_a_idx on test_mode
# +         Index Cond: (a = $1)
#  (3 rows)
#
#  -- but we can force a custom plan

The same modified test survived 50 iterations at 378a21618~1.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo&dt=2026-04-07%2012%3A45%3A07
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo&dt=2026-04-12%2022%3A45%3A06

Best regards,
Alexander

^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-20 20:28  Melanie Plageman <[email protected]>
  parent: Alexander Lakhin <[email protected]>
  0 siblings, 1 reply; 143+ messages in thread

From: Melanie Plageman @ 2026-04-20 20:28 UTC (permalink / raw)
  To: Alexander Lakhin <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; David Rowley <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Apr 20, 2026 at 3:00 PM Alexander Lakhin <[email protected]> wrote:
>
> I think, I found another test which suffers from autoanalyze with the new
> behavior: [1], [2].

Thanks for continuing to look for these!

> Initially I reproduced this diff on a slow armv7 device after many
> iterations of `make check` with:
> autovacuum_naptime = 1
> autovacuum_analyze_threshold = 1
> debug_parallel_query = 'regress'
>
> But now I see that it can be reproduced on an ordinary machine with just:
> --- a/src/test/regress/sql/plancache.sql
> +++ b/src/test/regress/sql/plancache.sql
> @@ -208,2 +208,3 @@ execute test_mode_pp(1); -- 2x
>  execute test_mode_pp(1); -- 3x
> +analyze test_mode;
>  execute test_mode_pp(1); -- 4x
> (and expected/plancache.out updated)
>
> and `make check` running in a loop. It failed for me on iterations 5, 4,
> 10 (as far as I can see, analyze updates relallvisible not every time):

If you do vacuum test_mode instead, it should fail reliably.

I think we can avoid having relallvisible updated by doing two things:
1) moving the analyze test_mode above the create index because the
create index will scan the table and could set pages all-visible and
then the analyze may update the statistics
2) create the table with autovacuum_enabled = false to avoid vacuum
and analyze running after any of the other table scans may set some
pages all-visible

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-21 15:07  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2026-04-21 15:07 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; David Rowley <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Apr 20, 2026 at 11:48 AM Melanie Plageman
<[email protected]> wrote:
>
> With shared buffers, while it is true you'd ideally leave the backend
> headroom for other read streams etc, it won't error out the way the
> temp table case does unless we've actually pinned all shared buffers.
> It will simply slow down the read ahead of the competing read streams.
>
> Attached is what I'm thinking of committing.

Okay, I committed this in da6874635db.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-21 18:41  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2026-04-21 18:41 UTC (permalink / raw)
  To: Alexander Lakhin <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; David Rowley <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Apr 20, 2026 at 4:28 PM Melanie Plageman
<[email protected]> wrote:
>
> I think we can avoid having relallvisible updated by doing two things:
> 1) moving the analyze test_mode above the create index because the
> create index will scan the table and could set pages all-visible and
> then the analyze may update the statistics
> 2) create the table with autovacuum_enabled = false to avoid vacuum
> and analyze running after any of the other table scans may set some
> pages all-visible

So, I don't love that this test relies on pg_class not having been
updated to reflect relallvisible. But, I don't see a good way around
this. If we need the "bad plan" to be a seq scan, then
pg_class.relallvisible can't be updated. As long as only analyze and
vacuum will update pg_class.relallvisible, we can preserve the desired
behavior by disabling autovacuum/analyze for the table and analyzing
it (to get reltuples/relpages) before any scan that could set the VM
for the pages. So, I've done this and committed it in 85ae8ab0533.

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-04-21 21:37  Melanie Plageman <[email protected]>
  parent: Melanie Plageman <[email protected]>
  1 sibling, 0 replies; 143+ messages in thread

From: Melanie Plageman @ 2026-04-21 21:37 UTC (permalink / raw)
  To: Alexander Lakhin <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; David Rowley <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, Apr 20, 2026 at 12:18 PM Melanie Plageman
<[email protected]> wrote:
>
>
> Yes, I think changing it to a temp table is the easiest fix. We could
> also do autovacuum_enabled=false, I think, but making it a temp table
> seems cleanest.
>
> I wonder if we should move the EXPLAIN test above the results queries,
> then throw in a vacuum in between some of them so we exercise btree
> gist as a bitmap heap scan and as an index only scan. It could provide
> a little bit more coverage? Or maybe that isn't actually extra
> coverage. I'm not sure.

I kept it simple and just committed making it a temp table in 62407d26b7c

- Melanie





^ permalink  raw  reply  [nested|flat] 143+ messages in thread


end of thread, other threads:[~2026-04-21 21:37 UTC | newest]

Thread overview: 143+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-06-23 20:25 eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2025-06-26 22:04 ` Melanie Plageman <[email protected]>
2025-07-09 21:59   ` Melanie Plageman <[email protected]>
2025-07-11 22:19     ` Melanie Plageman <[email protected]>
2025-07-13 18:34       ` Andrey Borodin <[email protected]>
2025-07-13 19:15         ` Melanie Plageman <[email protected]>
2025-07-14 06:37           ` Andrey Borodin <[email protected]>
2025-07-31 22:58             ` Melanie Plageman <[email protected]>
2025-08-01 21:36               ` Melanie Plageman <[email protected]>
2025-08-26 09:58                 ` Kirill Reshke <[email protected]>
2025-08-27 19:02                   ` Melanie Plageman <[email protected]>
2025-08-28 09:11                     ` Kirill Reshke <[email protected]>
2025-09-02 21:52                       ` Melanie Plageman <[email protected]>
2025-09-02 23:11                         ` Melanie Plageman <[email protected]>
2025-09-02 23:54                           ` Andres Freund <[email protected]>
2025-09-05 22:20                             ` Melanie Plageman <[email protected]>
2025-09-08 15:44                               ` Melanie Plageman <[email protected]>
2025-09-08 18:54                                 ` Robert Haas <[email protected]>
2025-09-08 19:14                                   ` Melanie Plageman <[email protected]>
2025-09-08 19:53                                     ` Robert Haas <[email protected]>
2025-09-08 20:14                                 ` Robert Haas <[email protected]>
2025-09-08 22:28                                   ` Melanie Plageman <[email protected]>
2025-09-09 14:00                                     ` Robert Haas <[email protected]>
2025-09-09 16:24                                       ` Melanie Plageman <[email protected]>
2025-09-09 19:26                                         ` Robert Haas <[email protected]>
2025-09-09 23:07                                           ` Melanie Plageman <[email protected]>
2025-09-10 20:01                                             ` Robert Haas <[email protected]>
2025-09-18 00:10                                               ` Melanie Plageman <[email protected]>
2025-09-18 16:48                                                 ` Andres Freund <[email protected]>
2025-09-24 17:07                                                   ` Melanie Plageman <[email protected]>
2025-09-24 20:13                                                     ` Robert Haas <[email protected]>
2025-10-06 22:40                                                       ` Melanie Plageman <[email protected]>
2025-10-08 22:54                                                         ` Melanie Plageman <[email protected]>
2025-10-09 18:18                                                           ` Andres Freund <[email protected]>
2025-10-14 23:26                                                             ` Melanie Plageman <[email protected]>
2025-10-29 11:03                                                               ` Kirill Reshke <[email protected]>
2025-11-04 16:48                                                                 ` Melanie Plageman <[email protected]>
2025-11-17 23:07                                                                   ` Melanie Plageman <[email protected]>
2025-11-19 09:35                                                                     ` Kirill Reshke <[email protected]>
2025-11-19 23:13                                                                       ` Melanie Plageman <[email protected]>
2025-11-20 17:19                                                                         ` Melanie Plageman <[email protected]>
2025-11-21 01:09                                                                           ` Chao Li <[email protected]>
2025-11-24 08:07                                                                             ` Chao Li <[email protected]>
2025-11-24 09:31                                                                               ` Chao Li <[email protected]>
2025-12-03 23:08                                                                               ` Melanie Plageman <[email protected]>
2025-11-25 21:43                                                                             ` Melanie Plageman <[email protected]>
2025-11-24 22:24                                                                           ` Andres Freund <[email protected]>
2025-12-03 23:07                                                                             ` Melanie Plageman <[email protected]>
2025-12-04 05:10                                                                               ` Chao Li <[email protected]>
2025-12-09 17:48                                                                                 ` Melanie Plageman <[email protected]>
2025-12-10 23:35                                                                                   ` Melanie Plageman <[email protected]>
2025-12-11 04:06                                                                                     ` Chao Li <[email protected]>
2025-12-15 21:29                                                                                       ` Melanie Plageman <[email protected]>
2025-12-16 16:58                                                                               ` Melanie Plageman <[email protected]>
2025-12-17 18:27                                                                                 ` Kirill Reshke <[email protected]>
2025-12-18 00:30                                                                                   ` Melanie Plageman <[email protected]>
2025-12-18 08:55                                                                                     ` Kirill Reshke <[email protected]>
2025-12-18 15:18                                                                                       ` Melanie Plageman <[email protected]>
2025-12-18 15:45                                                                                         ` Kirill Reshke <[email protected]>
2025-12-18 20:04                                                                                           ` Melanie Plageman <[email protected]>
2025-12-18 18:07                                                                                         ` Kirill Reshke <[email protected]>
2025-12-18 19:57                                                                                           ` Melanie Plageman <[email protected]>
2025-12-18 20:31                                                                                             ` Kirill Reshke <[email protected]>
2025-12-19 03:38                                                                                 ` Xuneng Zhou <[email protected]>
2025-12-19 21:09                                                                                   ` Melanie Plageman <[email protected]>
2025-12-20 12:32                                                                                     ` Kirill Reshke <[email protected]>
2025-12-22 18:20                                                                                       ` Melanie Plageman <[email protected]>
2025-12-22 07:19                                                                                     ` Chao Li <[email protected]>
2025-12-22 17:57                                                                                       ` Melanie Plageman <[email protected]>
2025-12-23 00:00                                                                                         ` Chao Li <[email protected]>
2025-12-23 01:18                                                                                           ` Melanie Plageman <[email protected]>
2026-03-02 23:38                                                                                             ` Melanie Plageman <[email protected]>
2026-03-03 07:32                                                                                               ` Chao Li <[email protected]>
2026-03-03 15:52                                                                                                 ` Melanie Plageman <[email protected]>
2026-03-04 08:59                                                                                                   ` Chao Li <[email protected]>
2026-03-05 08:52                                                                                                     ` Chao Li <[email protected]>
2026-03-06 02:40                                                                                                       ` Chao Li <[email protected]>
2026-03-06 23:33                                                                                                         ` Melanie Plageman <[email protected]>
2026-03-11 17:01                                                                                                           ` Melanie Plageman <[email protected]>
2026-03-15 19:10                                                                                                             ` Melanie Plageman <[email protected]>
2026-03-16 14:53                                                                                                               ` Melanie Plageman <[email protected]>
2026-03-17 09:05                                                                                                                 ` Kirill Reshke <[email protected]>
2026-03-17 14:48                                                                                                                   ` Melanie Plageman <[email protected]>
2026-03-18 17:14                                                                                                                     ` Andres Freund <[email protected]>
2026-03-20 02:38                                                                                                                       ` Melanie Plageman <[email protected]>
2026-03-20 23:37                                                                                                                         ` Melanie Plageman <[email protected]>
2026-03-22 19:58                                                                                                                           ` Melanie Plageman <[email protected]>
2026-03-23 21:54                                                                                                                             ` Melanie Plageman <[email protected]>
2026-03-24 06:53                                                                                                                               ` Kirill Reshke <[email protected]>
2026-03-24 17:53                                                                                                                               ` Andres Freund <[email protected]>
2026-03-24 23:44                                                                                                                                 ` Melanie Plageman <[email protected]>
2026-03-25 18:54                                                                                                                                   ` Melanie Plageman <[email protected]>
2026-03-25 23:14                                                                                                                                     ` Melanie Plageman <[email protected]>
2026-03-26 23:10                                                                                                                                       ` David Rowley <[email protected]>
2026-03-27 19:17                                                                                                                                         ` Melanie Plageman <[email protected]>
2026-03-29 17:16                                                                                                                                           ` Melanie Plageman <[email protected]>
2026-03-31 02:16                                                                                                                                             ` David Rowley <[email protected]>
2026-03-31 16:19                                                                                                                                               ` Melanie Plageman <[email protected]>
2026-03-31 22:14                                                                                                                                                 ` David Rowley <[email protected]>
2026-03-31 22:55                                                                                                                                                   ` Melanie Plageman <[email protected]>
2026-04-03 05:00                                                                                                                                                 ` Alexander Lakhin <[email protected]>
2026-04-05 15:42                                                                                                                                                   ` Melanie Plageman <[email protected]>
2026-04-05 15:49                                                                                                                                                     ` Andres Freund <[email protected]>
2026-04-16 20:21                                                                                                                                                       ` Melanie Plageman <[email protected]>
2026-04-18 16:33                                                                                                                                                         ` Andres Freund <[email protected]>
2026-04-20 15:48                                                                                                                                                           ` Melanie Plageman <[email protected]>
2026-04-21 15:07                                                                                                                                                             ` Melanie Plageman <[email protected]>
2026-04-18 15:00                                                                                                                                                   ` Alexander Lakhin <[email protected]>
2026-04-18 16:25                                                                                                                                                     ` Andres Freund <[email protected]>
2026-04-18 18:00                                                                                                                                                       ` Alexander Lakhin <[email protected]>
2026-04-20 16:18                                                                                                                                                         ` Melanie Plageman <[email protected]>
2026-04-20 19:00                                                                                                                                                           ` Alexander Lakhin <[email protected]>
2026-04-20 20:28                                                                                                                                                             ` Melanie Plageman <[email protected]>
2026-04-21 18:41                                                                                                                                                               ` Melanie Plageman <[email protected]>
2026-04-21 21:37                                                                                                                                                           ` Melanie Plageman <[email protected]>
2026-03-25 23:29                                                                                                                                     ` Tomas Vondra <[email protected]>
2026-03-26 14:51                                                                                                                                       ` Melanie Plageman <[email protected]>
2026-03-26 16:07                                                                                                                                         ` Tomas Vondra <[email protected]>
2026-03-27 19:31                                                                                                                                           ` Melanie Plageman <[email protected]>
2026-03-03 00:04                                                                                             ` Melanie Plageman <[email protected]>
2025-12-13 13:59                                                                           ` Peter Eisentraut <[email protected]>
2025-12-15 21:05                                                                             ` Melanie Plageman <[email protected]>
2025-12-16 12:18                                                                               ` Peter Eisentraut <[email protected]>
2025-12-16 16:07                                                                                 ` Melanie Plageman <[email protected]>
2025-11-20 17:55                                                                         ` Dagfinn Ilmari Mannsåker <[email protected]>
2025-11-20 18:02                                                                           ` Dagfinn Ilmari Mannsåker <[email protected]>
2025-11-20 22:23                                                                           ` Melanie Plageman <[email protected]>
2025-10-14 03:31                                                           ` Kirill Reshke <[email protected]>
2025-10-14 03:42                                                             ` Michael Paquier <[email protected]>
2025-10-14 14:16                                                               ` Melanie Plageman <[email protected]>
2026-04-06 14:30                                                           ` Alexey Makhmutov <[email protected]>
2026-04-06 14:44                                                             ` Andres Freund <[email protected]>
2026-04-06 15:32                                                               ` Alexey Makhmutov <[email protected]>
2026-04-13 16:17                                                                 ` Melanie Plageman <[email protected]>
2026-04-16 16:19                                                                   ` Melanie Plageman <[email protected]>
2025-09-08 16:41                               ` Robert Haas <[email protected]>
2025-09-08 18:32                                 ` Melanie Plageman <[email protected]>
2025-09-03 09:06                           ` Kirill Reshke <[email protected]>
2025-09-05 22:27                             ` Melanie Plageman <[email protected]>
2025-08-26 20:01                 ` Kirill Reshke <[email protected]>
2025-08-27 19:08                   ` Melanie Plageman <[email protected]>
2025-08-27 12:55                 ` Kirill Reshke <[email protected]>
2025-08-27 13:08                   ` Melanie Plageman <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox