public inbox for [email protected]  
help / color / mirror / Atom feed
From: Melanie Plageman <[email protected]>
To: Robert Haas <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Kirill Reshke <[email protected]>
Cc: Andrey Borodin <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Date: Mon, 6 Oct 2025 18:40:20 -0400
Message-ID: <CAAKRu_ZP-3=SaZykpwDBMJOdUKyW3Wm5JZfPFRR3L5Ac8ouq4w@mail.gmail.com> (raw)
In-Reply-To: <CA+TgmoYgCs=SEsohP6Z6R3KKsGaqFqvqxH8vQ_-nY4t+7rK8jg@mail.gmail.com>
References: <CAAKRu_Yz9x0sejBa5ov_LJ5sMOSKM3AeKOFUg+fQpNqyMmxwRA@mail.gmail.com>
	<CAAKRu_Y=QZ5iD7zt1AHcG3_G_iMR0w6ApGPgr8FKcDn-YLFvuQ@mail.gmail.com>
	<CA+TgmoasgmY7mzZutGisD2=3y7BwwPUS=oNsQoORKRg1r69fEA@mail.gmail.com>
	<CAAKRu_Y7X=0UAQa5b_2Z20z5+UPBtDbjazYD9228jmj-d9NpQA@mail.gmail.com>
	<CA+Tgmob05A07mtzeUGwxQKU9KZSf4BhJU9CXgcy4Pe3ZHxZrcw@mail.gmail.com>
	<CAAKRu_YX0NP_yhXvPnvDRjVxxprsRBM-_MZzAJskfMydMQ=ETA@mail.gmail.com>
	<CA+TgmoZef8XqRujP1NN=wJdV4SxOtu7rxRozsyAtaEvuVMZhEw@mail.gmail.com>
	<CAAKRu_YxD3UC3BXxS55jPjBC_yj_vn3FVoLvBMwQuHXGDXacGg@mail.gmail.com>
	<CA+TgmobYY2URHKBMh1NHo1zF3Z28TiS_+0aSyRYyBfvauCPZzA@mail.gmail.com>
	<CAAKRu_YOJ3VTKo4Z9vB2hGeTnwVWsL39gXH09vyBUQ7bGtDnKA@mail.gmail.com>
	<yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc@7uw6jyyxuyf7>
	<CAAKRu_ZiuR+YcUc7=TrgANbRakpzCu8X9zqR=Tf0fE6uDbfP1g@mail.gmail.com>
	<CA+TgmoYgCs=SEsohP6Z6R3KKsGaqFqvqxH8vQ_-nY4t+7rK8jg@mail.gmail.com>

On Wed, Sep 24, 2025 at 4:13 PM Robert Haas <[email protected]> wrote:
>
> I find this patch set quite hard to follow. 0001 altogether removes
> the use of XLOG_HEAP2_VISIBLE in cases where we use
> XLOG_HEAP2_MULTI_INSERT, but then 0007 (the next non-refactoring
> patch) begins half-removing the dependency on XLOG_HEAP2_VISIBLE,
> assisted by 0009 and 0010, and then later you come back and remove the
> other half of the dependency. I know it was I who proposed (off-list)
> first making the XLOG_HEAP2_VISIBLE record only deal with the VM page
> and not the heap buffer, but I'm not sure that idea quite worked out
> in terms of making this easier to follow. At the least, it seems weird
> that COPY FREEZE is an exception that gets handled in a different way
> than all the other cases, fully removing the dependency in one step.
> It would also be nice if each time you repost this, or maybe in a
> README that you post along beside the actual patches, you'd include
> some kind of roadmap to help the reader understand the internal
> structure of the patch set, like 1 does this, 2-9 get us to here,
> 10-whatever get us to this next place.

In attached v16, I’ve reverted to removing XLOG_HEAP2_VISIBLE
entirely, rather than first removing each caller's heap page from the
VM WAL chain. I reordered changes and squashed several refactoring
patches to improve patch-by-patch readability. This should make the
set read differently from earlier versions that removed
XLOG_HEAP2_VISIBLE and had more step-by-step mechanical refactoring.

I think if we plan to go all the way with removing XLOG_HEAP2_VISIBLE,
having intermediate patches that just set PD_ALL_VISIBLE when making
other heap pages are more confusing than helpful. Also, I think having
separate flags for setting PD_ALL_VISIBLE in the WAL record
over-complicated the code.

0001:  remove XLOG_HEAP2_VISIBLE from COPY FREEZE
0002 - 0005: various refactoring in advance of removing
XLOG_HEAP2_VISIBLE in pruning
0006: Pruning and freezing by vacuum sets the VM and emits a single
WAL record with those changes
0007: Reaping (phase III) by vacuum sets the VM and sets line pointers
unused in a single WAL record
0008 - 0009: XLOG_HEAP2_VISIBLE is eliminated
0010 - 0012: preparation for setting VM on-access
0013: set VM on-access
0014: set pd_prune_xid on insert

> I find myself fearful of the way that 0007 propagates the existing
> hacks around setting the VM bit into a new place:
>
> +               /*
> +                * We always emit a WAL record when setting
> PD_ALL_VISIBLE, but we are
> +                * careful not to emit a full page image unless
> +                * checksums/wal_log_hints are enabled. We only set
> the heap page LSN
> +                * if full page images were an option when emitting
> WAL. Otherwise,
> +                * subsequent modifications of the page may
> incorrectly skip emitting
> +                * a full page image.
> +                */
> +               if (do_prune || nplans > 0 ||
> +                       (xlrec.flags & XLHP_SET_PD_ALL_VIS &&
> XLogHintBitIsNeeded()))
> +                       PageSetLSN(page, lsn);
>
> I suppose it's not the worst thing to duplicate this logic, because
> you're later going to remove the original copy. But, it took me >10
> minutes to find the text in src/backend/access/transam/README, in the
> second half of the "Writing Hints" section, that explains the overall
> principle here, and since the patch set doesn't seem to touch that
> text, maybe you weren't even aware it was there.

I don't think that src/backend/access/transam/README must change with
my patch. It is still true that if the only change we are making to
the heap page is setting PD_ALL_VISIBLE and checksums/wal_log_hints
are disabled, we explicitly avoid an FPI and thus can't stamp the page
LSN.

> And, it's a little
> weird to have a single WAL record that is either a hint or not a hint
> depending on a complex set of conditions.

PD_ALL_VISIBLE is different from tuple hints and other page hints
because setting the VM is always WAL logged and when we replay that,
it will always set PD_ALL_VISIBLE, so PD_ALL_VISIBLE is effectively
always WAL-logged. The other hints aren't wal-logged unless checksums
are enabled and we need an FPI. So PD_ALL_VISIBLE is different from
other page hints in multiple ways. We can't make it more like those
hints because of needing to preserve the invariant that the VM is
never set when the page is clear. The only thing we could do is forbid
omitting the FPI even when checksums are not enabled.

> Anyway, I kind of wonder if it's time to back out the hack that I
> installed here many years ago. At the time, I thought that it would be
> bad if a VACUUM swept over the visibility map setting VM bits and as a
> result emitted an FPI for every page in the entire heap ... but
> everyone who is running with checksums has accepted that cost already,
> and with those being the default, that's probably going to be most
> people.

I agree that PD_ALL_VISIBLE persistence is complicated, but we have
other special cases that complicate the code for a performance
benefit. I guess the question is if we are saying people shouldn't run
without checksums in production. If that's true, then it's fine to
remove this optimization. Otherwise, I'm not so sure.

I think cloud providers generally have checksums enabled, but I don't
know what is common on-prem.

> It would be even more compelling if we were going to freeze,
> prune, and set all-visible on access, because then presumably the case
> where we touch a page and ONLY set the VM bit would be rare, so the
> cost of doing that wouldn't matter much, but I guess the patch doesn't
> go that far -- we can freeze or set all-visible on access but not
> prune, without which the scenario I was worrying about at the time is
> still fairly plausible, I think, if checksums are turned off.

With the whole set applied, we can prune and set the VM on access but
not freeze. I have a patch to do that, but it introduced noticeable
CPU overhead to prepare the freeze plans. I'd have to spend much more
time studying it to avoid regressing workloads where we don't end up
freezing but prepare the freeze plans during SELECT queries.

- Melanie


Attachments:

  [text/x-patch] v16-0005-Make-heap_page_is_all_visible-independent-of-LVR.patch (5.9K, 2-v16-0005-Make-heap_page_is_all_visible-independent-of-LVR.patch)
  download | inline diff:
From 280948d3f1f18b8a6c473d6b56023b0c795f0efa Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 3 Oct 2025 15:57:02 -0400
Subject: [PATCH v16 05/14] Make heap_page_is_all_visible independent of
 LVRelState

Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 44 +++++++++++++++-------------
 src/include/access/heapam.h          |  6 ++++
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8eef436dd10..aed1f8e1139 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,8 +463,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
-									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2014,8 +2012,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel, buf,
-									  &debug_cutoff, &debug_all_frozen))
+		if (!heap_page_is_all_visible(vacrel->rel, buf,
+									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+									  &debug_cutoff, &vacrel->offnum))
 			Assert(false);
 
 		Assert(presult.all_frozen == debug_all_frozen);
@@ -2917,8 +2916,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * emitted.
 	 */
 	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
-								 &all_frozen))
+	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+								 &all_frozen,
+								 &visibility_cutoff_xid,
+								 &vacrel->offnum))
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
@@ -3608,15 +3609,20 @@ dead_items_cleanup(LVRelState *vacrel)
  * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
  * on this page is frozen.
  *
- * This is a stripped down version of lazy_scan_prune().  If you change
- * anything here, make sure that everything stays in sync.  Note that an
- * assertion calls us to verify that everybody still agrees.  Be sure to avoid
- * introducing new side-effects here.
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync.  Note
+ * that an assertion calls us to verify that everybody still agrees.  Be sure
+ * to avoid introducing new side-effects here.
  */
-static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
-						 bool *all_frozen)
+						 OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3639,7 +3645,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		vacrel->offnum = offnum;
+		*logging_offnum = offnum;
 		itemid = PageGetItemId(page, offnum);
 
 		/* Unused or redirect line pointers are of no interest */
@@ -3663,10 +3669,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
-										 buf))
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3685,8 +3690,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin,
-											   vacrel->cutoffs.OldestXmin))
+					if (!TransactionIdPrecedes(xmin, OldestXmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
@@ -3721,7 +3725,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
 	}							/* scan along page */
 
 	/* Clear the offset information once we have processed the given page. */
-	vacrel->offnum = InvalidOffsetNumber;
+	*logging_offnum = InvalidOffsetNumber;
 
 	return all_visible;
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bc71fef6643..ea67fb83fbe 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -432,6 +432,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 TransactionId OldestXmin,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v16-0004-Update-PruneState.all_-visible-frozen-earlier-in.patch (14.8K, 3-v16-0004-Update-PruneState.all_-visible-frozen-earlier-in.patch)
  download | inline diff:
From a5772e0eec65df1cf064055b1ba77a51861f7fe8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 16:25:44 -0400
Subject: [PATCH v16 04/14] Update PruneState.all_[visible|frozen] earlier in
 pruning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen when dead items are present. This allows opportunistic
freezing if the page would otherwise be fully frozen, since those dead
items are later removed in vacuum’s third phase.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags promptly avoids extra bookkeeping in
heap_prune_unchanged_lp_normal(). At present this has no runtime effect
because all callers that consider setting the VM also attempt freezing,
but future callers (e.g. on-access pruning) may want to set the VM
without preparing freeze plans.

We also used to defer clearing all_visible and all_frozen until after
computing the visibility cutoff XID. By determining the cutoff earlier,
we can update these flags immediately after deciding whether to
opportunistically freeze. This is necessary if we want to set the VM in
the same WAL record that prunes and freezes tuples on the page.

While we are at it, unset all_frozen whenever we unset all_visible.
Previously we could only use all_frozen in combination with all_visible
as all_frozen was not unset when not all-visible tuples were encountered.
It is best to keep them both up-to-date to avoid mistakes when using
all_frozen.
---
 src/backend/access/heap/pruneheap.c  | 145 ++++++++++++++-------------
 src/backend/access/heap/vacuumlazy.c |   9 +-
 2 files changed, 78 insertions(+), 76 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f819ab57d55..c23a6a21a7f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -137,15 +137,12 @@ typedef struct
 	 * bits.  It is only valid if we froze some tuples, and all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen don't include LP_DEAD items.  That's
-	 * convenient for heap_page_prune_and_freeze(), to use them to decide
-	 * whether to freeze the page or not.  The all_visible and all_frozen
-	 * values returned to the caller are adjusted to include LP_DEAD items at
-	 * the end.
-	 *
-	 * all_frozen should only be considered valid if all_visible is also set;
-	 * we don't bother to clear the all_frozen flag every time we clear the
-	 * all_visible flag.
+	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
+	 * That's convenient for heap_page_prune_and_freeze(), to use them to
+	 * decide whether to freeze the page or not.  The all_visible and
+	 * all_frozen values returned to the caller are adjusted to include
+	 * LP_DEAD items after we determine whether or not to opportunistically
+	 * freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -308,7 +305,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * pre-freeze checks.
  *
  * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
- * been decided before calling this function.
+ * been decided before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to freeze
+ * tuples.
  *
  * prstate is an input/output parameter.
  *
@@ -320,7 +319,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 					  bool did_tuple_hint_fpi,
 					  bool do_prune,
 					  bool do_hint_prune,
-					  PruneState *prstate)
+					  PruneState *prstate,
+					  TransactionId *frz_conflict_horizon)
 {
 	bool		do_freeze = false;
 
@@ -357,8 +357,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * anymore.  The opportunistic freeze heuristic must be improved;
 		 * however, for now, try to approximate the old logic.
 		 */
-		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		if (prstate->all_frozen && prstate->nfrozen > 0)
 		{
+			Assert(prstate->all_visible);
+
 			/*
 			 * Freezing would make the page all-frozen.  Have already emitted
 			 * an FPI or will do so anyway?
@@ -388,6 +390,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * critical section.
 		 */
 		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+		/*
+		 * Calculate what the snapshot conflict horizon should be for a record
+		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+		 * for conflicts when the whole page is eligible to become all-frozen
+		 * in the VM once we're done with it.  Otherwise we generate a
+		 * conservative cutoff by stepping back from OldestXmin.
+		 */
+		if (prstate->all_frozen)
+			*frz_conflict_horizon = prstate->visibility_cutoff_xid;
+		else
+		{
+			/* Avoids false conflicts when hot_standby_feedback in use */
+			*frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+			TransactionIdRetreat(*frz_conflict_horizon);
+		}
 	}
 	else if (prstate->nfrozen > 0)
 	{
@@ -432,10 +450,11 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * considered advantageous for overall system performance to do so now.  The
  * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
  * arguments are required when freezing.  When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen on exit,
- * to indicate if the VM bits can be set.  They are always set to false when
- * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
- * callers that also freeze need that information.
+ * passed, we also set presult->all_visible and presult->all_frozen after
+ * determining whether or not to opporunistically freeze, to indicate if the
+ * VM bits can be set.  They are always set to false when the
+ * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
+ * that also freeze need that information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -471,6 +490,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_hint_prune;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId frz_conflict_horizon = InvalidTransactionId;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = params->vistest;
@@ -540,10 +560,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * are tuples present that are not visible to everyone or if there are
 	 * dead tuples which are not yet removable.  However, dead tuples which
 	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not clear
-	 * all_visible when we see LP_DEAD items.  We fix that at the end of the
-	 * function, when we return the value to the caller, so that the caller
-	 * doesn't set the VM bit incorrectly.
+	 * opportunistically freezing.  Because of that, we do not immediately
+	 * clear all_visible when we see LP_DEAD items.  We fix that after
+	 * scanning the line pointers, before we return the value to the caller,
+	 * so that the caller doesn't set the VM bit incorrectly.
 	 */
 	if (prstate.attempt_freeze)
 	{
@@ -778,8 +798,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 									  did_tuple_hint_fpi,
 									  do_prune,
 									  do_hint_prune,
-									  &prstate);
+									  &prstate,
+									  &frz_conflict_horizon);
 
+	/*
+	 * While scanning the line pointers, we did not clear
+	 * all_visible/all_frozen when encountering LP_DEAD items because we
+	 * wanted the decision whether or not to freeze the page to be unaffected
+	 * by the short-term presence of LP_DEAD items.  These LP_DEAD items are
+	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
+	 * matter which vacuum heap pass (initial pass or final pass) ends up
+	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 *
+	 * Now that we finished determining whether or not to freeze the page,
+	 * update all_visible and all_frozen so that they reflect the true state
+	 * of the page for setting PD_ALL_VISIBLE and VM bits.
+	 */
+	if (prstate.lpdead_items > 0)
+		prstate.all_visible = prstate.all_frozen = false;
+
+	Assert(!prstate.all_frozen || prstate.all_visible);
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -838,27 +876,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * on the standby with xids older than the youngest tuple this
 			 * record will freeze will conflict.
 			 */
-			TransactionId frz_conflict_horizon = InvalidTransactionId;
 			TransactionId conflict_xid;
 
-			/*
-			 * We can use the visibility_cutoff_xid as our cutoff for
-			 * conflicts when the whole page is eligible to become all-frozen
-			 * in the VM once we're done with it.  Otherwise we generate a
-			 * conservative cutoff by stepping back from OldestXmin.
-			 */
-			if (do_freeze)
-			{
-				if (prstate.all_visible && prstate.all_frozen)
-					frz_conflict_horizon = prstate.visibility_cutoff_xid;
-				else
-				{
-					/* Avoids false conflicts when hot_standby_feedback in use */
-					frz_conflict_horizon = prstate.cutoffs->OldestXmin;
-					TransactionIdRetreat(frz_conflict_horizon);
-				}
-			}
-
 			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
 				conflict_xid = frz_conflict_horizon;
 			else
@@ -882,30 +901,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
-	/*
-	 * It was convenient to ignore LP_DEAD items in all_visible earlier on to
-	 * make the choice of whether or not to freeze the page unaffected by the
-	 * short-term presence of LP_DEAD items.  These LP_DEAD items were
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
-	 *
-	 * Now that freezing has been finalized, unset all_visible if there are
-	 * any LP_DEAD items on the page.  It needs to reflect the present state
-	 * of the page, as expected by our caller.
-	 */
-	if (prstate.all_visible && prstate.lpdead_items == 0)
-	{
-		presult->all_visible = prstate.all_visible;
-		presult->all_frozen = prstate.all_frozen;
-	}
-	else
-	{
-		presult->all_visible = false;
-		presult->all_frozen = false;
-	}
-
+	presult->all_visible = prstate.all_visible;
+	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
 	/*
@@ -1285,8 +1282,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 
 	/*
 	 * Deliberately delay unsetting all_visible until later during pruning.
-	 * Removable dead tuples shouldn't preclude freezing the page.
+	 * Removable dead tuples shouldn't preclude freezing the page. If we won't
+	 * attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1412,7 +1412,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1434,7 +1434,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
+					prstate->all_visible = prstate->all_frozen = false;
 					break;
 				}
 
@@ -1447,7 +1447,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1466,7 +1466,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1484,7 +1484,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
+			prstate->all_visible = prstate->all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
@@ -1552,8 +1552,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible until later, at the end of
 	 * heap_page_prune_and_freeze().  This will allow us to attempt to freeze
 	 * the page after pruning.  As long as we unset it before updating the
-	 * visibility map, this will be correct.
+	 * visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+		prstate->all_visible = prstate->all_frozen = false;
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6125f157709..8eef436dd10 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2007,7 +2007,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * agreement with heap_page_is_all_visible() using an assertion.
 	 */
 #ifdef USE_ASSERT_CHECKING
-	/* Note that all_frozen value does not matter when !all_visible */
 	if (presult.all_visible)
 	{
 		TransactionId debug_cutoff;
@@ -2060,6 +2059,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
 	Assert(!presult.all_visible || !(*has_lpdead_items));
+	Assert(!presult.all_frozen || presult.all_visible);
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
@@ -2165,11 +2165,10 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both all_visible and all_frozen.
+	 * it as all-frozen.
 	 */
-	else if (all_visible_according_to_vm && presult.all_visible &&
-			 presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	else if (all_visible_according_to_vm && presult.all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		uint8		old_vmbits;
 
-- 
2.43.0



  [text/x-patch] v16-0002-Assorted-trivial-heap_page_prune_and_freeze-clea.patch (15.6K, 4-v16-0002-Assorted-trivial-heap_page_prune_and_freeze-clea.patch)
  download | inline diff:
From 33a35d23ae88d634cb01024295099e5d5466b1a3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 12:06:19 -0400
Subject: [PATCH v16 02/14] Assorted trivial heap_page_prune_and_freeze cleanup

Group heap_page_prune_and_freeze() input parameters in a struct and
clean up their documentation.

Rename a member of PruneState and disambiguate some local
heap_page_prune_and_freeze() variables.
---
 src/backend/access/heap/pruneheap.c  | 114 +++++++++++++--------------
 src/backend/access/heap/vacuumlazy.c |  16 ++--
 src/include/access/heapam.h          |  62 ++++++++++++---
 src/tools/pgindent/typedefs.list     |   1 +
 4 files changed, 115 insertions(+), 78 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d8ea0c78f77..9ba89b1fc28 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -42,8 +42,8 @@ typedef struct
 	/* whether or not dead items can be set LP_UNUSED during pruning */
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
-	bool		freeze;
-	struct VacuumCutoffs *cutoffs;
+	bool		attempt_freeze;
+	const struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
 	 * Fields describing what to do to the page
@@ -253,15 +253,23 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
 		{
 			OffsetNumber dummy_off_loc;
+			PruneFreezeParams params;
 			PruneFreezeResult presult;
 
+			params.relation = relation;
+			params.buffer = buffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
+
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
 			 * not the relation has indexes, since we cannot safely determine
 			 * that during on-access pruning with the current implementation.
 			 */
-			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
-									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+			params.options = 0;
+
+			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc, NULL, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -303,60 +311,43 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * also need to account for a reduction in the length of the line pointer
  * array following array truncation by us.
  *
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
- * required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now.  The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
- *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
  *
- * options:
- *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- *   pruning.
- *
- *   FREEZE indicates that we will also freeze tuples, and will return
- *   'all_visible', 'all_frozen' flags to the caller.
- *
- * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
- * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
- * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
+ * it's required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now.  The
+ * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
+ * arguments are required when freezing.  When HEAP_PRUNE_FREEZE option is
+ * passed, we also set presult->all_visible and presult->all_frozen on exit,
+ * to indicate if the VM bits can be set.  They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
+ * callers that also freeze need that information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
  * heap_page_prune_and_freeze() is responsible for initializing it.  Required
  * by all callers.
  *
- * reason indicates why the pruning is performed.  It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
  * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set.  On entry, they contain the oldest XID and
- * multi-XID seen on the relation so far.  They will be updated with oldest
- * values present on the page after pruning.  After processing the whole
- * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
- * for the relation.
+ * HEAP_PRUNE_FREEZE option is set in params.  On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far.  They will be updated
+ * with oldest values present on the page after pruning.  After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
  */
 void
-heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-						   GlobalVisState *vistest,
-						   int options,
-						   struct VacuumCutoffs *cutoffs,
+heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   PruneFreezeResult *presult,
-						   PruneReason reason,
 						   OffsetNumber *off_loc,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
 {
+	Buffer		buffer = params->buffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	OffsetNumber offnum,
@@ -365,15 +356,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	HeapTupleData tup;
 	bool		do_freeze;
 	bool		do_prune;
-	bool		do_hint;
-	bool		hint_bit_fpi;
+	bool		do_hint_prune;
+	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
 	/* Copy parameters to prstate */
-	prstate.vistest = vistest;
-	prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
-	prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
-	prstate.cutoffs = cutoffs;
+	prstate.vistest = params->vistest;
+	prstate.mark_unused_now =
+		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.cutoffs = params->cutoffs;
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -394,7 +386,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	/* initialize page freezing working state */
 	prstate.pagefrz.freeze_required = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
 		prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -441,7 +433,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * function, when we return the value to the caller, so that the caller
 	 * doesn't set the VM bit incorrectly.
 	 */
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
@@ -467,7 +459,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
 	maxoff = PageGetMaxOffsetNumber(page);
-	tup.t_tableOid = RelationGetRelid(relation);
+	tup.t_tableOid = RelationGetRelid(params->relation);
 
 	/*
 	 * Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -555,7 +547,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
 	 * an FPI to be emitted.
 	 */
-	hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+	did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
 
 	/*
 	 * Process HOT chains.
@@ -663,7 +655,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * pd_prune_xid field or the page was marked full, we will update the hint
 	 * bit.
 	 */
-	do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 		PageIsFull(page);
 
 	/*
@@ -671,7 +663,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * plans we prepared, or not.
 	 */
 	do_freeze = false;
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (prstate.pagefrz.freeze_required)
 		{
@@ -702,16 +694,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				 * Freezing would make the page all-frozen.  Have already
 				 * emitted an FPI or will do so anyway?
 				 */
-				if (RelationNeedsWAL(relation))
+				if (RelationNeedsWAL(params->relation))
 				{
-					if (hint_bit_fpi)
+					if (did_tuple_hint_fpi)
 						do_freeze = true;
 					else if (do_prune)
 					{
 						if (XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
 					}
-					else if (do_hint)
+					else if (do_hint_prune)
 					{
 						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
 							do_freeze = true;
@@ -753,7 +745,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
-	if (do_hint)
+	if (do_hint_prune)
 	{
 		/*
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
@@ -796,7 +788,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		/*
 		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
 		 */
-		if (RelationNeedsWAL(relation))
+		if (RelationNeedsWAL(params->relation))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -834,9 +826,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 			else
 				conflict_xid = prstate.latest_xid_removed;
 
-			log_heap_prune_and_freeze(relation, buffer,
+			log_heap_prune_and_freeze(params->relation, buffer,
 									  conflict_xid,
-									  true, reason,
+									  true, params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -894,7 +886,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
-	if (prstate.freeze)
+	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
 		{
@@ -1476,7 +1468,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 	}
 
 	/* Consider freezing any normal tuples which will not be removed */
-	if (prstate->freeze)
+	if (prstate->attempt_freeze)
 	{
 		bool		totally_frozen;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ab6938d1da1..6125f157709 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1951,10 +1951,16 @@ lazy_scan_prune(LVRelState *vacrel,
 {
 	Relation	rel = vacrel->rel;
 	PruneFreezeResult presult;
-	int			prune_options = 0;
+	PruneFreezeParams params;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
+	params.relation = rel;
+	params.buffer = buf;
+	params.reason = PRUNE_VACUUM_SCAN;
+	params.cutoffs = &vacrel->cutoffs;
+	params.vistest = vacrel->vistest;
+
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
 	 *
@@ -1970,12 +1976,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	prune_options = HEAP_PAGE_PRUNE_FREEZE;
+	params.options = HEAP_PAGE_PRUNE_FREEZE;
 	if (vacrel->nindexes == 0)
-		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
+		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
-	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
-							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+	heap_page_prune_and_freeze(&params,
+							   &presult,
 							   &vacrel->offnum,
 							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e60d34dad25..bc71fef6643 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -221,6 +221,55 @@ typedef struct HeapPageFreeze
 
 } HeapPageFreeze;
 
+
+/* 'reason' codes for heap_page_prune_and_freeze() */
+typedef enum
+{
+	PRUNE_ON_ACCESS,			/* on-access pruning */
+	PRUNE_VACUUM_SCAN,			/* VACUUM 1st heap pass */
+	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
+} PruneReason;
+
+/*
+ * Input parameters to heap_page_prune_and_freeze()
+ */
+typedef struct PruneFreezeParams
+{
+	Relation	relation;		/* relation containing buffer to be pruned */
+	Buffer		buffer;			/* buffer to be pruned */
+
+	/*
+	 * The reason pruning was performed.  It is used to set the WAL record
+	 * opcode which is used for debugging and analysis purposes.
+	 */
+	PruneReason reason;
+
+	/*
+	 * Contains flag bits:
+	 *
+	 * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+	 * pruning.
+	 *
+	 * FREEZE indicates that we will also freeze tuples, and will return
+	 * 'all_visible', 'all_frozen' flags to the caller.
+	 */
+	int			options;
+
+	/*
+	 * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+	 * (see heap_prune_satisfies_vacuum).
+	 */
+	GlobalVisState *vistest;
+
+	/*
+	 * cutoffs contains the freeze cutoffs, established by VACUUM at the
+	 * beginning of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE
+	 * option is set. cutoffs->OldestXmin is also used to determine if dead
+	 * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+	 */
+	struct VacuumCutoffs *cutoffs;
+} PruneFreezeParams;
+
 /*
  * Per-page state returned by heap_page_prune_and_freeze()
  */
@@ -264,13 +313,6 @@ typedef struct PruneFreezeResult
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 } PruneFreezeResult;
 
-/* 'reason' codes for heap_page_prune_and_freeze() */
-typedef enum
-{
-	PRUNE_ON_ACCESS,			/* on-access pruning */
-	PRUNE_VACUUM_SCAN,			/* VACUUM 1st heap pass */
-	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
-} PruneReason;
 
 /* ----------------
  *		function prototypes for heap access method
@@ -367,12 +409,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
-									   GlobalVisState *vistest,
-									   int options,
-									   struct VacuumCutoffs *cutoffs,
+extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
-									   PruneReason reason,
 									   OffsetNumber *off_loc,
 									   TransactionId *new_relfrozen_xid,
 									   MultiXactId *new_relmin_mxid);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37f26f6c6b7..8a626d633d5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2340,6 +2340,7 @@ ProjectionPath
 PromptInterruptContext
 ProtocolVersion
 PrsStorage
+PruneFreezeParams
 PruneFreezeResult
 PruneReason
 PruneState
-- 
2.43.0



  [text/x-patch] v16-0003-Add-helper-for-freeze-determination-to-heap_page.patch (7.0K, 5-v16-0003-Add-helper-for-freeze-determination-to-heap_page.patch)
  download | inline diff:
From f269cdce51b10d0b5ccc0e047ff08b247e6adf89 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 14:22:10 -0400
Subject: [PATCH v16 03/14] Add helper for freeze determination to
 heap_page_prune_and_freeze

After scanning through the line pointers on the heap page during
vacuum's first phase, we use several statuses and information we
collected to determine whether or not we will use the freeze plans we
assembled.

Do this in a helper for better readability.
---
 src/backend/access/heap/pruneheap.c | 196 +++++++++++++++++-----------
 1 file changed, 117 insertions(+), 79 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9ba89b1fc28..f819ab57d55 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -301,6 +301,118 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	}
 }
 
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans
+ * we prepared for the given heap buffer or not. If the caller specified we
+ * should not freeze tuples, it exits early. Otherwise, it does a few
+ * pre-freeze checks.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+					  bool did_tuple_hint_fpi,
+					  bool do_prune,
+					  bool do_hint_prune,
+					  PruneState *prstate)
+{
+	bool		do_freeze = false;
+
+	/*
+	 * If the caller specified we should not attempt to freeze any tuples,
+	 * validate that everything is in the right state and exit.
+	 */
+	if (!prstate->attempt_freeze)
+	{
+		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		return false;
+	}
+
+	if (prstate->pagefrz.freeze_required)
+	{
+		/*
+		 * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+		 * before FreezeLimit/MultiXactCutoff is present.  Must freeze to
+		 * advance relfrozenxid/relminmxid.
+		 */
+		do_freeze = true;
+	}
+	else
+	{
+		/*
+		 * Opportunistically freeze the page if we are generating an FPI
+		 * anyway and if doing so means that we can set the page all-frozen
+		 * afterwards (might not happen until VACUUM's final heap pass).
+		 *
+		 * XXX: Previously, we knew if pruning emitted an FPI by checking
+		 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze and
+		 * prune records were combined, this heuristic couldn't be used
+		 * anymore.  The opportunistic freeze heuristic must be improved;
+		 * however, for now, try to approximate the old logic.
+		 */
+		if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+		{
+			/*
+			 * Freezing would make the page all-frozen.  Have already emitted
+			 * an FPI or will do so anyway?
+			 */
+			if (RelationNeedsWAL(relation))
+			{
+				if (did_tuple_hint_fpi)
+					do_freeze = true;
+				else if (do_prune)
+				{
+					if (XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+				else if (do_hint_prune)
+				{
+					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+						do_freeze = true;
+				}
+			}
+		}
+	}
+
+	if (do_freeze)
+	{
+		/*
+		 * Validate the tuples we will be freezing before entering the
+		 * critical section.
+		 */
+		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+	}
+	else if (prstate->nfrozen > 0)
+	{
+		/*
+		 * The page contained some tuples that were not already frozen, and we
+		 * chose not to freeze them now.  The page won't be all-frozen then.
+		 */
+		Assert(!prstate->pagefrz.freeze_required);
+
+		prstate->all_frozen = false;
+		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
+	}
+	else
+	{
+		/*
+		 * We have no freeze plans to execute.  The page might already be
+		 * all-frozen (perhaps only following pruning), though.  Such pages
+		 * can be marked all-frozen in the VM by our caller, even though none
+		 * of its tuples were newly frozen here.
+		 */
+	}
+
+	return do_freeze;
+}
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -662,85 +774,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
 	 */
-	do_freeze = false;
-	if (prstate.attempt_freeze)
-	{
-		if (prstate.pagefrz.freeze_required)
-		{
-			/*
-			 * heap_prepare_freeze_tuple indicated that at least one XID/MXID
-			 * from before FreezeLimit/MultiXactCutoff is present.  Must
-			 * freeze to advance relfrozenxid/relminmxid.
-			 */
-			do_freeze = true;
-		}
-		else
-		{
-			/*
-			 * Opportunistically freeze the page if we are generating an FPI
-			 * anyway and if doing so means that we can set the page
-			 * all-frozen afterwards (might not happen until VACUUM's final
-			 * heap pass).
-			 *
-			 * XXX: Previously, we knew if pruning emitted an FPI by checking
-			 * pgWalUsage.wal_fpi before and after pruning.  Once the freeze
-			 * and prune records were combined, this heuristic couldn't be
-			 * used anymore.  The opportunistic freeze heuristic must be
-			 * improved; however, for now, try to approximate the old logic.
-			 */
-			if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
-			{
-				/*
-				 * Freezing would make the page all-frozen.  Have already
-				 * emitted an FPI or will do so anyway?
-				 */
-				if (RelationNeedsWAL(params->relation))
-				{
-					if (did_tuple_hint_fpi)
-						do_freeze = true;
-					else if (do_prune)
-					{
-						if (XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-					else if (do_hint_prune)
-					{
-						if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
-							do_freeze = true;
-					}
-				}
-			}
-		}
-	}
-
-	if (do_freeze)
-	{
-		/*
-		 * Validate the tuples we will be freezing before entering the
-		 * critical section.
-		 */
-		heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
-	}
-	else if (prstate.nfrozen > 0)
-	{
-		/*
-		 * The page contained some tuples that were not already frozen, and we
-		 * chose not to freeze them now.  The page won't be all-frozen then.
-		 */
-		Assert(!prstate.pagefrz.freeze_required);
-
-		prstate.all_frozen = false;
-		prstate.nfrozen = 0;	/* avoid miscounts in instrumentation */
-	}
-	else
-	{
-		/*
-		 * We have no freeze plans to execute.  The page might already be
-		 * all-frozen (perhaps only following pruning), though.  Such pages
-		 * can be marked all-frozen in the VM by our caller, even though none
-		 * of its tuples were newly frozen here.
-		 */
-	}
+	do_freeze = heap_page_will_freeze(params->relation, buffer,
+									  did_tuple_hint_fpi,
+									  do_prune,
+									  do_hint_prune,
+									  &prstate);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
-- 
2.43.0



  [text/x-patch] v16-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patch (12.1K, 6-v16-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patch)
  download | inline diff:
From 4312376fff987b32d4599ccd78893c8c2f7770e0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v16 01/14] Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE

Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting
bits in the VM, specify the changes to make to the VM block in the
XLOG_HEAP2_MULTI_INSERT record.

This halves the number of WAL records emitted by COPY FREEZE.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c        | 44 ++++++++++------
 src/backend/access/heap/heapam_xlog.c   | 52 ++++++++++++++++++-
 src/backend/access/heap/visibilitymap.c | 68 ++++++++++++++++++++++++-
 src/backend/access/rmgrdesc/heapdesc.c  |  5 ++
 src/include/access/visibilitymap.h      |  3 ++
 5 files changed, 154 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ed0c0c2dc9f..7f354caec31 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
 
 		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+		{
 			all_frozen_set = true;
+			/* Lock the vmbuffer before entering the critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+		}
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
@@ -2506,7 +2510,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		 * going to add further frozen rows to it.
 		 *
 		 * If we're only adding already frozen rows to a previously empty
-		 * page, mark it as all-visible.
+		 * page, mark it as all-frozen and update the visibility map. We're
+		 * already holding a pin on the vmbuffer.
 		 */
 		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
@@ -2517,7 +2522,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
 		else if (all_frozen_set)
+		{
 			PageSetAllVisible(page);
+			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 RelationGetRelationName(relation));
+		}
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			xlrec->flags = 0;
 			if (all_visible_cleared)
 				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+			/*
+			 * We don't have to worry about including a conflict xid in the
+			 * WAL record as HEAP_INSERT_FROZEN intentionally violates
+			 * visibility rules.
+			 */
 			if (all_frozen_set)
 				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
 
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 			XLogBeginInsert();
 			XLogRegisterData(xlrec, tupledata - scratch.data);
+
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+			if (all_frozen_set)
+				XLogRegisterBuffer(1, vmbuffer, 0);
 
 			XLogRegisterBufData(0, tupledata, totaldatalen);
 
@@ -2637,26 +2658,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(page, recptr);
+			if (all_frozen_set)
+			{
+				Assert(BufferIsDirty(vmbuffer));
+				PageSetLSN(BufferGetPage(vmbuffer), recptr);
+			}
 		}
 
 		END_CRIT_SECTION();
 
-		/*
-		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
-		 */
 		if (all_frozen_set)
-		{
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..c2c7e6ab086 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	int			i;
 	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
 
 	/*
 	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rlocator);
-		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
 		ReleaseBuffer(vmbuffer);
+		vmbuffer = InvalidBuffer;
 		FreeFakeRelcacheEntry(reln);
 	}
 
@@ -662,6 +663,55 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (BufferIsValid(buffer))
 		UnlockReleaseBuffer(buffer);
 
+	buffer = InvalidBuffer;
+
+	/*
+	 * Read and update the visibility map (VM) block.
+	 *
+	 * We must always redo VM changes, even if the corresponding heap page
+	 * update was skipped due to the LSN interlock. Each VM block covers
+	 * multiple heap pages, so later WAL records may update other bits in the
+	 * same block. If this record includes a full-page image (FPI), subsequent
+	 * WAL records may depend on it to guard against torn pages.
+	 *
+	 * Heap page changes are replayed first to preserve the invariant:
+	 * PD_ALL_VISIBLE must be set on the heap page if the VM bit is set.
+	 *
+	 * Note that we released the heap page lock above. Under normal operation,
+	 * this would be unsafe — a concurrent modification could clear
+	 * PD_ALL_VISIBLE while the VM bit remained set, violating the invariant.
+	 *
+	 * During recovery, however, no concurrent writers exist. Therefore,
+	 * updating the VM without holding the heap page lock is safe enough. This
+	 * same approach is taken when replaying xl_heap_visible records (see
+	 * heap_xlog_visible()).
+	 */
+	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
+		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		char	   *relname;
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		/* We don't have relation name during recovery, so use relfilenode */
+		relname = psprintf("%u", rlocator.relNumber);
+		visibilitymap_set_vmbits(blkno,
+								 vmbuffer,
+								 VISIBILITYMAP_ALL_VISIBLE |
+								 VISIBILITYMAP_ALL_FROZEN,
+								 relname);
+
+		PageSetLSN(BufferGetPage(vmbuffer), lsn);
+		pfree(relname);
+	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
 	/*
 	 * If the page is running low on free space, update the FSM as well.
 	 * Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..738105eb97e 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
+ *		visibilitymap_set_vmbits - set bit(s) in a pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -321,6 +322,71 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	return status;
 }
 
+/*
+ * Set visibility map (VM) flags in the block referenced by vmBuf.
+ *
+ * This function is intended for callers that log VM changes together
+ * with the heap page modifications that rendered the page all-visible.
+ * Callers that log VM changes separately should use visibilitymap_set().
+ *
+ * Caller responsibilities:
+ * - vmBuf must be pinned and exclusively locked, and it must cover the
+ *   VM bits corresponding to heapBlk.
+ * - In normal operation (not recovery), this must be called inside a
+ *   critical section that also applies the necessary heap page changes
+ *   and, if applicable, emits WAL.
+ * - The caller is responsible for WAL logging the VM buffer changes and
+ *   for any required modifications to the associated heap page. This
+ *   includes preserving invariants such as holding a pin and exclusive
+ *   lock on the buffer containing heapBlk.
+ *
+ * heapRelname is used only for debugging.
+ */
+uint8
+visibilitymap_set_vmbits(BlockNumber heapBlk,
+						 Buffer vmBuf, uint8 flags,
+						 const char *heapRelname)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+	Page		page;
+	uint8	   *map;
+	uint8		status;
+
+#ifdef TRACE_VISIBILITYMAP
+	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+		 flags, heapRelname, heapBlk);
+#endif
+
+	/* Call in same critical section where WAL is emitted. */
+	Assert(InRecovery || CritSectionCount > 0);
+
+	/* Flags should be valid. Also never clear bits with this function */
+	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+	/* Must never set all_frozen bit without also setting all_visible bit */
+	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+	Assert(BufferIsExclusiveLocked(vmBuf));
+
+	page = BufferGetPage(vmBuf);
+	map = (uint8 *) PageGetContents(page);
+
+	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+	if (flags != status)
+	{
+		map[mapByte] |= (flags << mapOffset);
+		MarkBufferDirty(vmBuf);
+	}
+
+	return status;
+}
+
 /*
  *	visibilitymap_get_status - get status of bits
  *
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
 #include "storage/standbydefs.h"
 
 /*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			appendStringInfo(buf, ", vm_flags: 0x%02X",
+							 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
 		if (XLogRecHasBlockData(record, 0) && !isinit)
 		{
 			appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..3dcf37ba03f 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,9 @@ extern uint8 visibilitymap_set(Relation rel,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
+extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
+									  Buffer vmBuf, uint8 flags,
+									  const char *heapRelname);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
-- 
2.43.0



  [text/x-patch] v16-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-prune-f.patch (50.6K, 7-v16-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-prune-f.patch)
  download | inline diff:
From 0141c10d30bd7ea620d16d24201ba22e5337a4dc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:52:08 -0400
Subject: [PATCH v16 06/14] Eliminate XLOG_HEAP2_VISIBLE from vacuum
 prune/freeze
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum’s prune/freeze work, not to pruning
performed during normal page access.
---
 src/backend/access/heap/heapam_xlog.c  | 158 +++++++--
 src/backend/access/heap/pruneheap.c    | 474 ++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c   | 202 +----------
 src/backend/access/rmgrdesc/heapdesc.c |  11 +-
 src/include/access/heapam.h            |  36 +-
 src/include/access/heapam_xlog.h       |  17 +-
 6 files changed, 584 insertions(+), 314 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index c2c7e6ab086..911416bbc56 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Buffer		buffer;
 	RelFileLocator rlocator;
 	BlockNumber blkno;
-	XLogRedoAction action;
+	Buffer		vmbuffer = InvalidBuffer;
+	uint8		vmflags = 0;
+	Size		freespace = 0;
 
 	XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
 	memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,22 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
 		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
 
+	if (xlrec.flags & XLHP_VM_ALL_VISIBLE)
+	{
+		vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (xlrec.flags & XLHP_VM_ALL_FROZEN)
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+	}
+
 	/*
-	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
-	 * ensure that there are no queries running for which the removed tuples
-	 * are still visible or which still consider the frozen xids as running.
-	 * The conflict horizon XID comes after xl_heap_prune.
+	 * After xl_heap_prune is the optional snapshot conflict horizon.
+	 *
+	 * In Hot Standby mode, we must ensure that there are no running queries
+	 * which would conflict with the changes in this record. That means we
+	 * can't replay this record if it removes tuples that are still visible to
+	 * transactions on the standby, freeze tuples with xids that are still
+	 * considered running on the standby, or set a page as all-visible in the
+	 * VM if it isn't all-visible to all transactions on the standby.
 	 */
 	if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
 	{
@@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 	}
 
 	/*
-	 * If we have a full-page image, restore it and we're done.
+	 * If we have a full-page image of the heap block, restore it and we're
+	 * done with the heap block.
 	 */
-	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
-										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
-										   &buffer);
-	if (action == BLK_NEEDS_REDO)
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+									  &buffer) == BLK_NEEDS_REDO)
 	{
 		Page		page = BufferGetPage(buffer);
 		OffsetNumber *redirected;
@@ -90,6 +103,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		xlhp_freeze_plan *plans;
 		OffsetNumber *frz_offsets;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
+		bool		do_prune;
+		bool		mark_buffer_dirty = false;
+		bool		set_lsn = false;
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
 											   &nplans, &plans, &frz_offsets,
@@ -97,11 +113,16 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 											   &ndead, &nowdead,
 											   &nunused, &nowunused);
 
+		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
+		/* Ensure the record does something */
+		Assert(do_prune || nplans > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
 		/*
 		 * Update all line pointers per the record, and repair fragmentation
 		 * if needed.
 		 */
-		if (nredirected > 0 || ndead > 0 || nunused > 0)
+		if (do_prune)
 			heap_page_prune_execute(buffer,
 									(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
 									redirected, nredirected,
@@ -138,36 +159,121 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
+		if (do_prune || nplans > 0)
+			mark_buffer_dirty = set_lsn = true;
+
+		/*
+		 * The critical integrity requirement here is that we must never end
+		 * up with with the visibility map bit set and the page-level
+		 * PD_ALL_VISIBLE bit clear.  If that were to occur, a subsequent page
+		 * modification would fail to clear the visibility map bit.
+		 *
+		 * If this record only sets the VM, no need to dirty the heap page.
+		 */
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			mark_buffer_dirty = true;
+
+			/*
+			 * Always emit a WAL record when setting PD_ALL_VISIBLE but only
+			 * emit an FPI if checksums/wal_log_hints are enabled. Advance the
+			 * page LSN only if the record could include an FPI, since
+			 * recovery skips records <= the stamped LSN. Otherwise it might
+			 * skip an earlier FPI needed to repair a torn page.
+			 */
+			if (XLogHintBitIsNeeded())
+				set_lsn = true;
+		}
+
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+
+		if (set_lsn)
+			PageSetLSN(page, lsn);
+
 		/*
 		 * Note: we don't worry about updating the page's prunability hints.
 		 * At worst this will cause an extra prune cycle to occur soon.
 		 */
-
-		PageSetLSN(page, lsn);
-		MarkBufferDirty(buffer);
 	}
 
 	/*
-	 * If we released any space or line pointers, update the free space map.
+	 * If we released any space or line pointers or set PD_ALL_VISIBLE or the
+	 * VM, update the freespace map.
+	 *
+	 * Even when no actual space is freed (e.g., when only marking the page
+	 * all-visible or frozen), we still update the FSM. Because the FSM is
+	 * unlogged and maintained heuristically, it often becomes stale on
+	 * standbys. If such a standby is later promoted and runs VACUUM, it will
+	 * skip recalculating free space for pages that were marked all-visible
+	 * (or all-frozen, depending on the mode). FreeSpaceMapVacuum can then
+	 * propagate overly optimistic free space values upward, causing future
+	 * insertions to select pages that turn out to be unusable. In bulk, this
+	 * can lead to long stalls.
+	 *
+	 * To prevent this, always refresh the FSM’s view when a page becomes
+	 * all-visible or all-frozen.
+	 *
+	 * Do this regardless of whether a full-page image is logged, since FSM
+	 * data is not part of the page itself.
 	 *
-	 * Do this regardless of a full-page image being applied, since the FSM
-	 * data is not in the page anyway.
 	 */
 	if (BufferIsValid(buffer))
 	{
-		if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
-						   XLHP_HAS_DEAD_ITEMS |
-						   XLHP_HAS_NOW_UNUSED_ITEMS))
-		{
-			Size		freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+		if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+							XLHP_HAS_DEAD_ITEMS |
+							XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+			(vmflags & VISIBILITYMAP_VALID_BITS))
+			freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
 
-			UnlockReleaseBuffer(buffer);
+		/*
+		 * We want to avoid holding an exclusive lock on the heap buffer while
+		 * doing IO (either of the FSM or the VM), so we'll release the lock
+		 * on the heap buffer before doing either.
+		 */
+		UnlockReleaseBuffer(buffer);
+	}
 
-			XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+	/*
+	 * Now read and update the VM block.
+	 *
+	 * We must redo changes to the VM even if the heap page was skipped due to
+	 * LSN interlock. See comment in heap_xlog_multi_insert() for more details
+	 * on replaying changes to the VM.
+	 */
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) &&
+		XLogReadBufferForRedoExtended(record, 1,
+									  RBM_ZERO_ON_ERROR,
+									  false,
+									  &vmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		vmpage = BufferGetPage(vmbuffer);
+		char	   *relname;
+		uint8		old_vmbits = 0;
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(vmpage))
+			PageInit(vmpage, BLCKSZ, 0);
+
+		/* We don't have relation name during recovery, so use relfilenode */
+		relname = psprintf("%u", rlocator.relNumber);
+		old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
+
+		/* Only set VM page LSN if we modified the page */
+		if (old_vmbits != vmflags)
+		{
+			Assert(BufferIsDirty(vmbuffer));
+			PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		}
-		else
-			UnlockReleaseBuffer(buffer);
+		pfree(relname);
 	}
+
+	if (BufferIsValid(vmbuffer))
+		UnlockReleaseBuffer(vmbuffer);
+
+	if (freespace > 0)
+		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c23a6a21a7f..f384d74416a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,6 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -43,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	const struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -132,17 +135,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze(), to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether or not to opportunistically
-	 * freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -173,6 +176,19 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
 
 static void page_verify_redirects(Page page);
 
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid, bool blk_already_av,
+									  bool set_blk_all_frozen);
+
+static bool heap_page_will_set_vis(Relation relation,
+								   BlockNumber heap_blk,
+								   Buffer heap_buf,
+								   Buffer vmbuffer,
+								   bool blk_known_av,
+								   const PruneState *prstate,
+								   uint8 *vmflags,
+								   bool *do_set_pd_vis);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -258,6 +274,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			params.reason = PRUNE_ON_ACCESS;
 			params.vistest = vistest;
 			params.cutoffs = NULL;
+			params.vmbuffer = InvalidBuffer;
+			params.blk_known_av = false;
 
 			/*
 			 * For now, pass mark_unused_now as false regardless of whether or
@@ -431,10 +449,108 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneState and blk_known_av. Some callers may already
+ * have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
+ * should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+					   BlockNumber heap_blk,
+					   Buffer heap_buf,
+					   Buffer vmbuffer,
+					   bool blk_known_av,
+					   const PruneState *prstate,
+					   uint8 *vmflags,
+					   bool *do_set_pd_vis)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+	bool		do_set_vm = false;
+
+	*do_set_pd_vis = false;
+
+	if (!prstate->attempt_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		Assert(*vmflags == 0);
+		return false;
+	}
+
+	if (prstate->all_visible && !PageIsAllVisible(heap_page))
+		*do_set_pd_vis = true;
+
+	if ((prstate->all_visible && !blk_known_av) ||
+		(prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*vmflags = VISIBILITYMAP_ALL_VISIBLE;
+		if (prstate->all_frozen)
+			*vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+		do_set_vm = true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that in vacuum the bit
+	 * got cleared after heap_vac_scan_next_block() was called, so we must
+	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return do_set_vm;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -449,12 +565,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * it's required in order to advance relfrozenxid / relminmxid, or if it's
  * considered advantageous for overall system performance to do so now.  The
  * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
- * arguments are required when freezing.  When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen after
- * determining whether or not to opporunistically freeze, to indicate if the
- * VM bits can be set.  They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
- * that also freeze need that information.
+ * arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -479,6 +596,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	OffsetNumber offnum,
@@ -488,15 +606,22 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
+	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	TransactionId frz_conflict_horizon = InvalidTransactionId;
+	TransactionId conflict_xid = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 
 	/* Copy parameters to prstate */
 	prstate.vistest = params->vistest;
 	prstate.mark_unused_now =
 		(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
 	prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate.attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
 	prstate.cutoffs = params->cutoffs;
 
 	/*
@@ -543,50 +668,54 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prstate.deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Caller may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
+	 *
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible when we see LP_DEAD items.  We fix that after
-	 * scanning the line pointers, before we return the value to the caller,
-	 * so that the caller doesn't set the VM bit incorrectly.
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * when we encounter LP_DEAD items. Instead, we correct all_visible after
+	 * deciding whether to freeze, but before updating the VM, to avoid
+	 * setting the VM bit incorrectly.
+	 *
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate.attempt_freeze)
 	{
 		prstate.all_visible = true;
 		prstate.all_frozen = true;
 	}
+	else if (prstate.attempt_update_vm)
+	{
+		prstate.all_visible = true;
+		prstate.all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate.all_visible = false;
 		prstate.all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -818,6 +947,35 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+	 * based on information from the VM and the all_visible/all_frozen flags.
+	 *
+	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+	 * VM bit is clear, we strongly prefer to keep them in sync.
+	 *
+	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+	 * already been set. Setting only the VM is most common when setting an
+	 * already all-visible page all-frozen.
+	 */
+	do_set_vm = heap_page_will_set_vis(params->relation,
+									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   &prstate, &new_vmbits, &do_set_pd_vis);
+
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									prstate.latest_xid_removed, frz_conflict_horizon,
+									prstate.visibility_cutoff_xid, params->blk_known_av,
+									(do_set_vm && (new_vmbits & VISIBILITYMAP_ALL_FROZEN)));
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
 
@@ -838,14 +996,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_pd_vis)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -859,64 +1020,91 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (do_set_pd_vis)
+			PageSetAllVisible(page);
 
-		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
-		 */
-		if (RelationNeedsWAL(params->relation))
+		if (do_prune || do_freeze || do_set_pd_vis)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
+			Assert(PageIsAllVisible(page));
 
-			if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
-				conflict_xid = frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
+			old_vmbits = visibilitymap_set_vmbits(blockno,
+												  vmbuffer, new_vmbits,
+												  RelationGetRelationName(params->relation));
+			if (old_vmbits == new_vmbits)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				/* Unset so we don't emit WAL since no change occurred */
+				do_set_vm = false;
+			}
+		}
 
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did If we were
+		 * only updating the VM and it turns out it was already set, we will
+		 * have unset do_set_vm earlier. As such, check it again before
+		 * emitting the record.
+		 */
+		if (RelationNeedsWAL(params->relation) &&
+			(do_prune || do_freeze || do_set_vm))
 			log_heap_prune_and_freeze(params->relation, buffer,
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  do_set_pd_vis,
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
 									  prstate.nowunused, prstate.nunused);
-		}
 	}
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
+
+		if (!heap_page_is_all_visible(params->relation, buffer,
+									  prstate.cutoffs->OldestXmin,
+									  &debug_all_frozen,
+									  &debug_cutoff, off_loc))
+			Assert(false);
+
+		Assert(prstate.all_frozen == debug_all_frozen);
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.visibility_cutoff_xid);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -2058,6 +2246,64 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 	return nplans;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid, bool blk_already_av,
+				 bool set_blk_all_frozen)
+{
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	TransactionId conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune && !do_freeze &&
+		do_set_vm && blk_already_av && set_blk_all_frozen)
+		conflict_xid = InvalidTransactionId;
+
+	return conflict_xid;
+}
+
 /*
  * Write an XLOG_HEAP2_PRUNE* WAL record
  *
@@ -2078,14 +2324,24 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * replaying 'unused' items depends on whether they were all previously marked
  * as dead.
  *
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
  */
 void
 log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+						  Buffer vmbuffer, uint8 vmflags,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
+						  bool set_pd_all_vis,
 						  PruneReason reason,
 						  HeapTupleFreeze *frozen, int nfrozen,
 						  OffsetNumber *redirected, int nredirected,
@@ -2095,6 +2351,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xl_heap_prune xlrec;
 	XLogRecPtr	recptr;
 	uint8		info;
+	uint8		regbuf_flags;
 
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2103,8 +2360,23 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	xlhp_prune_items dead_items;
 	xlhp_prune_items unused_items;
 	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
 
 	xlrec.flags = 0;
+	regbuf_flags = REGBUF_STANDARD;
+
+	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+
+	/*
+	 * We can avoid an FPI if the only modification we are making to the heap
+	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+	 * Note that if we explicitly skip an FPI, we must not set the heap page
+	 * LSN later.
+	 */
+	if (!do_prune &&
+		nfrozen == 0 &&
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
+		regbuf_flags |= REGBUF_NO_IMAGE;
 
 	/*
 	 * Prepare data for the buffer.  The arrays are not actually in the
@@ -2112,7 +2384,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * page image, the arrays can be omitted.
 	 */
 	XLogBeginInsert();
-	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+	XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+		XLogRegisterBuffer(1, vmbuffer, 0);
+
 	if (nfrozen > 0)
 	{
 		int			nplans;
@@ -2169,6 +2445,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * Prepare the main xl_heap_prune record.  We already set the XLHP_HAS_*
 	 * flag above.
 	 */
+	if (vmflags & VISIBILITYMAP_ALL_VISIBLE)
+	{
+		xlrec.flags |= XLHP_VM_ALL_VISIBLE;
+		if (vmflags & VISIBILITYMAP_ALL_FROZEN)
+			xlrec.flags |= XLHP_VM_ALL_FROZEN;
+	}
 	if (RelationIsAccessibleInLogicalDecoding(relation))
 		xlrec.flags |= XLHP_IS_CATALOG_REL;
 	if (TransactionIdIsValid(conflict_xid))
@@ -2201,5 +2483,23 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 	recptr = XLogInsert(RM_HEAP2_ID, info);
 
-	PageSetLSN(BufferGetPage(buffer), recptr);
+	if (vmflags & VISIBILITYMAP_VALID_BITS)
+	{
+		Assert(BufferIsDirty(vmbuffer));
+		PageSetLSN(BufferGetPage(vmbuffer), recptr);
+	}
+
+	/*
+	 * We must bump the page LSN if pruning or freezing. If we are only
+	 * updating PD_ALL_VISIBLE, though, we can skip doing this unless
+	 * wal_log_hints/checksums are enabled. Torn pages are possible if we
+	 * update PD_ALL_VISIBLE without bumping the LSN, but this is deemed okay
+	 * for page hint updates.
+	 */
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
+	{
+		Assert(BufferIsDirty(buffer));
+		PageSetLSN(BufferGetPage(buffer), recptr);
+	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index aed1f8e1139..39526bf608f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1958,6 +1958,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	params.reason = PRUNE_VACUUM_SCAN;
 	params.cutoffs = &vacrel->cutoffs;
 	params.vistest = vacrel->vistest;
+	params.vmbuffer = vmbuffer;
+	params.blk_known_av = all_visible_according_to_vm;
 
 	/*
 	 * Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1974,7 +1976,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * tuples. Pruning will have determined whether or not the page is
 	 * all-visible.
 	 */
-	params.options = HEAP_PAGE_PRUNE_FREEZE;
+	params.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS;
 	if (vacrel->nindexes == 0)
 		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
@@ -1997,33 +1999,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2057,168 +2032,26 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
 	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
+		vacrel->vm_new_visible_pages++;
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_frozen_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			vacrel->vm_new_visible_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
 	return presult.ndeleted;
@@ -2892,8 +2725,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidTransactionId,
+								  InvalidBuffer,	/* vmbuffer */
+								  0,	/* vmflags */
+								  InvalidTransactionId, /* conflict_xid */
 								  false,	/* no cleanup lock required */
+								  false,	/* set_pd_all_vis */
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..1cb44ca32d3 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
  * code, the latter of which is used in frontend (pg_waldump) code.
  */
 void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 									   int *nplans, xlhp_freeze_plan **plans,
 									   OffsetNumber **frz_offsets,
 									   int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", isCatalogRel: %c",
 						 xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
 
+		if (xlrec->flags & XLHP_VM_ALL_VISIBLE)
+		{
+			uint8		vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+			if (xlrec->flags & XLHP_VM_ALL_FROZEN)
+				vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+		}
+
 		if (XLogRecHasBlockData(record, 0))
 		{
 			Size		datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ea67fb83fbe..2de39ba0cd1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 *
+	 * vmbuffer is the buffer that must already contain contain the required
+	 * block of the visibility map if we are to update it. blk_known_av is the
+	 * visibility status of the heap block as of the last call to
+	 * find_next_unskippable_block().
+	 */
+	Buffer		vmbuffer;
+	bool		blk_known_av;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -250,8 +261,9 @@ typedef struct PruneFreezeParams
 	 * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
 	 * pruning.
 	 *
-	 * FREEZE indicates that we will also freeze tuples, and will return
-	 * 'all_visible', 'all_frozen' flags to the caller.
+	 * FREEZE indicates that we will also freeze tuples
+	 *
+	 * UPDATE_VIS indicates that we will set the page's status in the VM.
 	 */
 	int			options;
 
@@ -284,19 +296,15 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
 	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+	 * we have attempted to update the VM.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
+	uint8		new_vmbits;
+	uint8		old_vmbits;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -420,8 +428,10 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
 									OffsetNumber *nowunused, int nunused);
 extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
 extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+									  Buffer vmbuffer, uint8 vmflags,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
+									  bool set_pd_all_vis,
 									  PruneReason reason,
 									  HeapTupleFreeze *frozen, int nfrozen,
 									  OffsetNumber *redirected, int nredirected,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..16c2b2e3c9c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
  * Main data section:
  *
  *	xl_heap_prune
- *		uint8				flags
+ *		uint16				flags
  *	TransactionId			snapshot_conflict_horizon
  *
  * Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
  */
 typedef struct xl_heap_prune
 {
-	uint8		flags;
+	uint16		flags;
 
 	/*
 	 * If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,7 +292,7 @@ typedef struct xl_heap_prune
 	 */
 } xl_heap_prune;
 
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
 /* to handle recovery conflict during logical decoding on standby */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
@@ -330,6 +330,15 @@ typedef struct xl_heap_prune
 #define		XLHP_HAS_DEAD_ITEMS	        (1 << 6)
 #define		XLHP_HAS_NOW_UNUSED_ITEMS   (1 << 7)
 
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set.
+ * xl_heap_prune should always use the XLHP_VM_ALL_VISIBLE and
+ * XLHP_VM_ALL_FROZEN flags and translate them to their visibilitymapdefs.h
+ * equivalents, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN.
+ */
+#define		XLHP_VM_ALL_VISIBLE			(1 << 8)
+#define		XLHP_VM_ALL_FROZEN			(1 << 9)
+
 /*
  * xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
  * (appears in xl_heap_prune's xlhp_freeze_plans sub-record)
@@ -497,7 +506,7 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
 								   uint8 vmflags);
 
 /* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
 												   OffsetNumber **frz_offsets,
 												   int *nredirected, OffsetNumber **redirected,
-- 
2.43.0



  [text/x-patch] v16-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (10.2K, 8-v16-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 95d94ee991ea163b4b7861a193b3a1a3497de73e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:54:38 -0400
Subject: [PATCH v16 07/14] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase III

Instead of emitting a separate XLOG_HEAP2_VISIBLE record for each page
that becomes all-visible in vacuum's third phase, record the
visibility map update in the already emitted
XLOG_HEAP2_PRUNE_VACUUM_CLEANUP record.

Visibility checks are now performed before marking dead items unused.
This is safe because the heap page is held under exclusive lock for the
entire operation.

This reduces the number of WAL records generated by VACUUM phase III by
up to 50%.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 174 +++++++++++++++++++--------
 1 file changed, 124 insertions(+), 50 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 39526bf608f..cf1c2efc999 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,6 +463,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 						   int num_offsets);
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2685,8 +2692,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
 	TransactionId visibility_cutoff_xid;
+	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
+	uint8		vmflags = 0;
 
 	Assert(vacrel->do_index_vacuuming);
 
@@ -2697,6 +2706,31 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
 							 InvalidOffsetNumber);
 
+	/*
+	 * Before marking dead items unused, check whether the page will become
+	 * all-visible once that change is applied. This lets us reap the tuples
+	 * and mark the page all-visible within the same critical section,
+	 * enabling both changes to be emitted in a single WAL record. Since the
+	 * visibility checks may perform I/O and allocate memory, they must be
+	 * done outside the critical section.
+	 */
+	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+									   vacrel->cutoffs.OldestXmin,
+									   deadoffsets, num_offsets,
+									   &all_frozen, &visibility_cutoff_xid,
+									   &vacrel->offnum))
+	{
+		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+		if (all_frozen)
+		{
+			vmflags |= VISIBILITYMAP_ALL_FROZEN;
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+		}
+
+		/* Take the lock on the vmbuffer before entering a critical section */
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+	}
+
 	START_CRIT_SECTION();
 
 	for (int i = 0; i < num_offsets; i++)
@@ -2716,6 +2750,21 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	/* Attempt to truncate line pointer array now */
 	PageTruncateLinePointerArray(page);
 
+	/*
+	 * The page is guaranteed to have had dead line pointers, so
+	 * PD_ALL_VISIBLE cannot be already set. Therefore, whenever we set the VM
+	 * bit, we must also set PD_ALL_VISIBLE. The heap page lock is held while
+	 * updating the VM to ensure consistency.
+	 */
+	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+	{
+		PageSetAllVisible(page);
+		visibilitymap_set_vmbits(blkno,
+								 vmbuffer, vmflags,
+								 RelationGetRelationName(vacrel->rel));
+		conflict_xid = visibility_cutoff_xid;
+	}
+
 	/*
 	 * Mark buffer dirty before we write WAL.
 	 */
@@ -2725,11 +2774,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (RelationNeedsWAL(vacrel->rel))
 	{
 		log_heap_prune_and_freeze(vacrel->rel, buffer,
-								  InvalidBuffer,	/* vmbuffer */
-								  0,	/* vmflags */
-								  InvalidTransactionId, /* conflict_xid */
+								  vmbuffer, vmflags,
+								  conflict_xid,
 								  false,	/* no cleanup lock required */
-								  false,	/* set_pd_all_vis */
+								  (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
@@ -2737,41 +2785,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  unused, nunused);
 	}
 
-	/*
-	 * End critical section, so we safely can do visibility tests (which
-	 * possibly need to perform IO and allocate memory!). If we crash now the
-	 * page (including the corresponding vm bit) might not be marked all
-	 * visible, but that's fine. A later vacuum will fix that.
-	 */
 	END_CRIT_SECTION();
 
-	/*
-	 * Now that we have removed the LP_DEAD items from the page, once again
-	 * check if the page has become all-visible.  The page is already marked
-	 * dirty, exclusively locked, and, if needed, a full page image has been
-	 * emitted.
-	 */
-	Assert(!PageIsAllVisible(page));
-	if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
-								 &all_frozen,
-								 &visibility_cutoff_xid,
-								 &vacrel->offnum))
+	if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-		if (all_frozen)
-		{
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  flags);
-
 		/* Count the newly set VM page for logging */
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 		vacrel->vm_new_visible_pages++;
 		if (all_frozen)
 			vacrel->vm_new_visible_frozen_pages++;
@@ -3440,18 +3459,8 @@ dead_items_cleanup(LVRelState *vacrel)
 }
 
 /*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples.  Set *all_frozen to true if every tuple
- * on this page is frozen.
- *
- * *logging_offnum will have the OffsetNumber of the current tuple being
- * processed for vacuum's error callback system.
- *
- * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
- * you change anything here, make sure that everything stays in sync.  Note
- * that an assertion calls us to verify that everybody still agrees.  Be sure
- * to avoid introducing new side-effects here.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
@@ -3460,15 +3469,74 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
+
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+
+/*
+ * Check whether the heap page in buf is all-visible except for the dead
+ * tuples referenced in the deadoffsets array.
+ *
+ * The visibility checks may perform IO and allocate memory so they must not
+ * be done in a critical section. This function is used by vacuum to determine
+ * if the page will be all-visible once it reaps known dead tuples. That way
+ * it can do both in the same critical section and emit a single WAL record.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * Output parameters:
+ *
+ *  - *all_frozen: true if every tuple on the page is frozen
+ *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *logging_offnum: OffsetNumber of current tuple being processed;
+ *     used by vacuum's error callback system.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This logic is closely related to heap_prune_record_unchanged_lp_normal().
+ * If you modify this function, ensure consistency with that code. An
+ * assertion cross-checks that both remain in agreement. Do not introduce new
+ * side-effects.
+ */
+static bool
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+							   TransactionId OldestXmin,
+							   OffsetNumber *deadoffsets,
+							   int ndeadoffsets,
+							   bool *all_frozen,
+							   TransactionId *visibility_cutoff_xid,
+							   OffsetNumber *logging_offnum)
+{
 	Page		page = BufferGetPage(buf);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
 	bool		all_visible = true;
+	int			matched_dead_count = 0;
 
 	*visibility_cutoff_xid = InvalidTransactionId;
 	*all_frozen = true;
 
+	Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+	/* Confirm input deadoffsets[] is strictly sorted */
+	if (ndeadoffsets > 1)
+	{
+		for (int i = 1; i < ndeadoffsets; i++)
+			Assert(deadoffsets[i - 1] < deadoffsets[i]);
+	}
+#endif
+
 	maxoff = PageGetMaxOffsetNumber(page);
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff && all_visible;
@@ -3496,9 +3564,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 		 */
 		if (ItemIdIsDead(itemid))
 		{
-			all_visible = false;
-			*all_frozen = false;
-			break;
+			if (!deadoffsets ||
+				matched_dead_count >= ndeadoffsets ||
+				deadoffsets[matched_dead_count] != offnum)
+			{
+				*all_frozen = all_visible = false;
+				break;
+			}
+			matched_dead_count++;
+			continue;
 		}
 
 		Assert(ItemIdIsNormal(itemid));
-- 
2.43.0



  [text/x-patch] v16-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.5K, 9-v16-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 3e79e84930ba110a0dbf4abe6b3c84f3c021c78a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v16 08/14] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cf1c2efc999..cf9de40ff3c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1877,9 +1877,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1896,13 +1899,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 RelationGetRelationName(vacrel->rel));
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  true, /* set_pd_all_vis */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v16-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (26.4K, 10-v16-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From d32451ace53d97e8e11deb12c87655c6e937ee0d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v16 09/14] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |  18 ++-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 110 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  28 +---
 src/include/access/visibilitymap.h       |  15 +--
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 56 insertions(+), 377 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7f354caec31..14a2996b9ee 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 RelationGetRelationName(relation));
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  RelationGetRelationName(relation));
 		}
 
 		/*
@@ -8798,50 +8798,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 911416bbc56..69d1f0b8633 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -258,7 +258,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 
 		/* We don't have relation name during recovery, so use relfilenode */
 		relname = psprintf("%u", rlocator.relNumber);
-		old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
+		old_vmbits = visibilitymap_set(blkno, vmbuffer, vmflags, relname);
 
 		/* Only set VM page LSN if we modified the page */
 		if (old_vmbits != vmflags)
@@ -276,142 +276,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -789,8 +653,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -805,11 +669,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 		/* We don't have relation name during recovery, so use relfilenode */
 		relname = psprintf("%u", rlocator.relNumber);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 relname);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  relname);
 
 		PageSetLSN(BufferGetPage(vmbuffer), lsn);
 		pfree(relname);
@@ -1390,9 +1254,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f384d74416a..142781d0008 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1030,9 +1030,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		{
 			Assert(PageIsAllVisible(page));
 
-			old_vmbits = visibilitymap_set_vmbits(blockno,
-												  vmbuffer, new_vmbits,
-												  RelationGetRelationName(params->relation));
+			old_vmbits = visibilitymap_set(blockno,
+										   vmbuffer, new_vmbits,
+										   RelationGetRelationName(params->relation));
 			if (old_vmbits == new_vmbits)
 			{
 				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
@@ -2309,14 +2309,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ *   all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cf9de40ff3c..bed77af23a2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1899,11 +1899,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 RelationGetRelationName(vacrel->rel));
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  RelationGetRelationName(vacrel->rel));
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2783,9 +2783,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
 	{
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 RelationGetRelationName(vacrel->rel));
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  RelationGetRelationName(vacrel->rel));
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 738105eb97e..dfa6113f0a9 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,107 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (XLogRecPtrIsInvalid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
 
 /*
  * Set visibility map (VM) flags in the block referenced by vmBuf.
@@ -343,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * heapRelname is used only for debugging.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const char *heapRelname)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const char *heapRelname)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 1cb44ca32d3..93505cb8c56 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -460,9 +453,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..e9e77bd678b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 3dcf37ba03f..859e5795457 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
@@ -31,15 +30,11 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const char *heapRelname);
+
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const char *heapRelname);
+
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8a626d633d5..48eb3cf4466 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4272,7 +4272,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v16-0010-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (8.2K, 11-v16-0010-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From 1e4108e0c5b007fe55f12c29f4a47247ba023ef9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v16 10/14] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.

The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 16 ++++++++--------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 17 ++++++++---------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 142781d0008..78e04f1d17c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -233,7 +233,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -729,9 +729,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
-	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
-	 * transaction aborts.
+	 * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+	 * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+	 * aborts.
 	 *
 	 * It's also good for performance. Most commonly tuples within a page are
 	 * stored at decreasing offsets (while the items are stored at increasing
@@ -1154,11 +1154,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1616,7 +1616,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				/*
 				 * For now always use prstate->cutoffs for this test, because
 				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisTestIsRemovableXid instead, if a
+				 * could use GlobalVisXidVisibleToAll() instead, if a
 				 * non-freezing caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 }
 
 /*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
  *
  * It is crucial that this only gets called for xids from a source that
  * protects against xid wraparounds (e.g. from a table and thus protected by
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
  */
 bool
 GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v16-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (10.5K, 12-v16-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From a28aef72286f446c53614621ebe7f8b65ee4b59b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v16 11/14] Use GlobalVisState in vacuum to determine page
 level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum directly: GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
In the rare case that it moves backward, VACUUM falls back to OldestXmin
to ensure we don’t attempt to freeze a dead tuple that wasn’t yet
prunable according to the GlobalVisState.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++++
 src/backend/access/heap/pruneheap.c         | 37 ++++++++++-----------
 src/backend/access/heap/vacuumlazy.c        | 17 +++++-----
 src/include/access/heapam.h                 |  7 ++--
 4 files changed, 57 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 78e04f1d17c..e5b16bd2b38 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -711,11 +711,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate.visibility_cutoff_xid = InvalidTransactionId;
 
@@ -911,6 +912,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update the hint
@@ -1081,10 +1092,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		if (!heap_page_is_all_visible(params->relation, buffer,
-									  prstate.cutoffs->OldestXmin,
+									  prstate.vistest,
 									  &debug_all_frozen,
 									  &debug_cutoff, off_loc))
 			Assert(false);
@@ -1613,19 +1623,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' when freezing is requested. We
-				 * could use GlobalVisXidVisibleToAll() instead, if a
-				 * non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index bed77af23a2..3af8a359e42 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2739,7 +2739,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3488,14 +3488,13 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3514,7 +3513,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3533,7 +3532,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3605,7 +3604,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		tuple.t_len = ItemIdGetLength(itemid);
 		tuple.t_tableOid = RelationGetRelid(rel);
 
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3624,7 +3623,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2de39ba0cd1..df0632aebc6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -276,8 +276,7 @@ typedef struct PruneFreezeParams
 	/*
 	 * cutoffs contains the freeze cutoffs, established by VACUUM at the
 	 * beginning of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE
-	 * option is set. cutoffs->OldestXmin is also used to determine if dead
-	 * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+	 * option is set.
 	 */
 	struct VacuumCutoffs *cutoffs;
 } PruneFreezeParams;
@@ -443,7 +442,7 @@ extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 
 extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
@@ -455,6 +454,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v16-0012-Inline-TransactionIdFollows-Precedes.patch (5.0K, 13-v16-0012-Inline-TransactionIdFollows-Precedes.patch)
  download | inline diff:
From 9ed00b821b89276c80382bc810e6a3368cc35521 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v16 12/14] Inline TransactionIdFollows/Precedes()

Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.

Reviewed-by: Kirill Reshke <[email protected]>
---
 src/backend/access/transam/transam.c | 64 -------------------------
 src/include/access/transam.h         | 70 ++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
-	/*
-	 * If either ID is a permanent XID then we can just do unsigned
-	 * comparison.  If both are normal, do a modulo-2^32 comparison.
-	 */
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 < id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 <= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 > id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
-	int32		diff;
-
-	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
-		return (id1 >= id2);
-
-	diff = (int32) (id1 - id2);
-	return (diff >= 0);
-}
-
 
 /*
  * TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
 } TransamVariablesData;
 
 
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+	/*
+	 * If either ID is a permanent XID then we can just do unsigned
+	 * comparison.  If both are normal, do a modulo-2^32 comparison.
+	 */
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 < id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 <= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 > id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+	int32		diff;
+
+	if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+		return (id1 >= id2);
+
+	diff = (int32) (id1 - id2);
+	return (diff >= 0);
+}
+
+
 /* ----------------
  *		extern declarations
  * ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
 extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
-- 
2.43.0



  [text/x-patch] v16-0014-Set-pd_prune_xid-on-insert.patch (6.7K, 14-v16-0014-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From bd82158f3836798a6ea9194e70e33b93980fbbde Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v16 14/14] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../isolation/expected/index-killtuples.out   |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6181e355aaf..1704269715e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 69d1f0b8633..51f7961075f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -475,6 +475,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -624,9 +630,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



  [text/x-patch] v16-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch (27.9K, 15-v16-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 13ff9fd8071f9b7aea07cca603c51a9a3cd659f1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v16 13/14] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
 src/backend/access/heap/heapam.c              | 15 +++-
 src/backend/access/heap/heapam_handler.c      | 15 +++-
 src/backend/access/heap/pruneheap.c           | 71 +++++++++++++++----
 src/backend/access/index/indexam.c            | 46 ++++++++++++
 src/backend/access/table/tableam.c            | 39 ++++++++--
 src/backend/executor/execMain.c               |  4 ++
 src/backend/executor/execUtils.c              |  2 +
 src/backend/executor/nodeBitmapHeapscan.c     |  7 +-
 src/backend/executor/nodeIndexscan.c          | 18 +++--
 src/backend/executor/nodeSeqscan.c            | 24 +++++--
 src/include/access/genam.h                    | 11 +++
 src/include/access/heapam.h                   | 24 ++++++-
 src/include/access/relscan.h                  |  6 ++
 src/include/access/tableam.h                  | 30 +++++++-
 src/include/nodes/execnodes.h                 |  6 ++
 .../t/035_standby_logical_decoding.pl         |  3 +-
 16 files changed, 282 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14a2996b9ee..6181e355aaf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_ALLOW_VM_SET)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e5b16bd2b38..fa3b38cdadc 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -186,7 +186,9 @@ static bool heap_page_will_set_vis(Relation relation,
 								   Buffer heap_buf,
 								   Buffer vmbuffer,
 								   bool blk_known_av,
-								   const PruneState *prstate,
+								   PruneReason reason,
+								   bool do_prune, bool do_freeze,
+								   PruneState *prstate,
 								   uint8 *vmflags,
 								   bool *do_set_pd_vis);
 
@@ -201,9 +203,13 @@ static bool heap_page_will_set_vis(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -269,12 +275,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeParams params;
 			PruneFreezeResult presult;
 
+			params.options = 0;
+			params.vmbuffer = InvalidBuffer;
+
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			params.relation = relation;
 			params.buffer = buffer;
 			params.reason = PRUNE_ON_ACCESS;
 			params.vistest = vistest;
 			params.cutoffs = NULL;
-			params.vmbuffer = InvalidBuffer;
 			params.blk_known_av = false;
 
 			/*
@@ -455,6 +470,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * have examined this page’s VM bits (e.g., VACUUM in the previous
  * heap_vac_scan_next_block() call) and can pass that along.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with the desired
  * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
  * should be set on the heap page.
@@ -465,7 +483,9 @@ heap_page_will_set_vis(Relation relation,
 					   Buffer heap_buf,
 					   Buffer vmbuffer,
 					   bool blk_known_av,
-					   const PruneState *prstate,
+					   PruneReason reason,
+					   bool do_prune, bool do_freeze,
+					   PruneState *prstate,
 					   uint8 *vmflags,
 					   bool *do_set_pd_vis)
 {
@@ -481,6 +501,23 @@ heap_page_will_set_vis(Relation relation,
 		return false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+	{
+		prstate->all_visible = prstate->all_frozen = false;
+		return false;
+	}
+
 	if (prstate->all_visible && !PageIsAllVisible(heap_page))
 		*do_set_pd_vis = true;
 
@@ -504,6 +541,9 @@ heap_page_will_set_vis(Relation relation,
 	 * page-level bit is clear.  However, it's possible that in vacuum the bit
 	 * got cleared after heap_vac_scan_next_block() was called, so we must
 	 * recheck with buffer lock before concluding that the VM is corrupt.
+	 *
+	 * XXX: This will never trigger for on-access pruning because it passes
+	 * blk_known_av as false. Should we remove that condition here?
 	 */
 	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
 			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -912,6 +952,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.ndead > 0 ||
 		prstate.nunused > 0;
 
+	/*
+	 * Even if we don't prune anything, if we found a new value for the
+	 * pd_prune_xid field or the page was marked full, we will update the hint
+	 * bit.
+	 */
+	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+		PageIsFull(page);
+
 	/*
 	 * After processing all the live tuples on the page, if the newest xmin
 	 * amongst them is not visible to everyone, the page cannot be
@@ -922,14 +970,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
 		prstate.all_visible = prstate.all_frozen = false;
 
-	/*
-	 * Even if we don't prune anything, if we found a new value for the
-	 * pd_prune_xid field or the page was marked full, we will update the hint
-	 * bit.
-	 */
-	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
-		PageIsFull(page);
-
 	/*
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
@@ -973,6 +1013,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	do_set_vm = heap_page_will_set_vis(params->relation,
 									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   params->reason, do_prune, do_freeze,
 									   &prstate, &new_vmbits, &do_set_pd_vis);
 
 	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -2245,7 +2286,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
 
 /*
  * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
- * record.
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
  */
 static TransactionId
 get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
@@ -2314,8 +2355,8 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
  * all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..8d582a8eafd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
 	return scan;
 }
 
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+					  Relation indexRelation,
+					  Snapshot snapshot,
+					  IndexScanInstrumentation *instrument,
+					  int nkeys, int norderbys, bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan(heapRelation,
+						   indexRelation,
+						   snapshot,
+						   instrument,
+						   nkeys, norderbys);
+
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+	return scan;
+}
+
 /*
  * index_beginscan_bitmap - start a scan of an index with amgetbitmap
  *
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	return scan;
 }
 
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+							   IndexScanInstrumentation *instrument,
+							   int nkeys, int norderbys,
+							   ParallelIndexScanDesc pscan,
+							   bool modifies_base_rel)
+{
+	IndexScanDesc scan;
+
+	scan = index_beginscan_parallel(heaprel, indexrel,
+									instrument,
+									nkeys, norderbys,
+									pscan);
+	scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+	return scan;
+}
+
 /* ----------------
  * index_getnext_tid - get the next TID from a scan
  *
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 5e41404937e..3e3a0f72a71 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+													 uint32 flags);
+
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 	}
 }
 
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+								uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
-		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 											pscan, flags);
 }
 
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+							   bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 831c55ce787..15be318fd41 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  node->ss.ps.state->es_modified_relids);
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   modifies_rel);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+
+		bool		modifies_base_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
-		scandesc = index_beginscan(node->ss.ss_currentRelation,
-								   node->iss_RelationDesc,
-								   estate->es_snapshot,
-								   &node->iss_Instrument,
-								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+		scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+										 node->iss_RelationDesc,
+										 estate->es_snapshot,
+										 &node->iss_Instrument,
+										 node->iss_NumScanKeys,
+										 node->iss_NumOrderByKeys,
+										 modifies_base_rel);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		bool		modifies_rel =
+			bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						  estate->es_modified_relids);
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
-								   0, NULL);
+		scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+										 estate->es_snapshot,
+										 0, NULL, modifies_rel);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
+	bool		modifies_rel;
 	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+								 estate->es_modified_relids);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+									   modifies_rel);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	bool		modifies_rel =
+		bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					  node->ss.ps.state->es_modified_relids);
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+									   pscan,
+									   modifies_rel);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..aa2112c8e04 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -178,6 +178,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+										   Relation indexRelation,
+										   Snapshot snapshot,
+										   IndexScanInstrumentation *instrument,
+										   int nkeys, int norderbys, bool modifies_heap_rel);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -204,6 +209,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+													IndexScanInstrumentation *instrument,
+													int nkeys, int norderbys,
+													ParallelIndexScanDesc pscan,
+													bool modifies_rel);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index df0632aebc6..59d8ce9ad42 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -415,7 +432,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..f250d4e7aec 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* whether or not scan should attempt to set the VM */
+	SO_ALLOW_VM_SET = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -882,6 +884,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+	uint32		flags = SO_TYPE_SEQSCAN |
+		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
 /*
  * Like table_beginscan(), but for scanning catalog. It'll automatically use a
  * snapshot appropriate for scanning catalog relations.
@@ -919,10 +940,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, bool modifies_rel)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (!modifies_rel)
+		flags |= SO_ALLOW_VM_SET;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
 }
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
 extern TableScanDesc table_beginscan_parallel(Relation relation,
 											  ParallelTableScanDesc pscan);
 
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+													ParallelTableScanDesc pscan,
+													bool modifies_rel);
+
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
  * responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a36653c37f9..9c54fa06e4a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



view thread (143+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  In-Reply-To: <CAAKRu_ZP-3=SaZykpwDBMJOdUKyW3Wm5JZfPFRR3L5Ac8ouq4w@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox