public inbox for [email protected]  
help / color / mirror / Atom feed
From: Melanie Plageman <[email protected]>
To: Andres Freund <[email protected]>
Cc: Kirill Reshke <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Andrey Borodin <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Cc: Chao Li <[email protected]>
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Date: Wed, 3 Dec 2025 18:07:38 -0500
Message-ID: <CAAKRu_agCLAQ9OjmrTdJe-X=Xr7QnU4d=cfxdQGwc9jNx9w31w@mail.gmail.com> (raw)
In-Reply-To: <mhf4vkmh3j57zx7vuxp4jagtdzwhu3573pgfpmnjwqa6i6yj5y@sy4ymcdtdklo>
References: <CAAKRu_ZP-3=SaZykpwDBMJOdUKyW3Wm5JZfPFRR3L5Ac8ouq4w@mail.gmail.com>
	<CAAKRu_bgkOQqu3K5n4YLRsNBZqJ9Rjg80ROqgKSr2UGz4b5hUg@mail.gmail.com>
	<2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl@mdyyqpqzxjek>
	<CAAKRu_YR-COJ9aGnMUQqt5yoWmUBjikqrd4jGNZYouHaXpis9g@mail.gmail.com>
	<CALdSSPhiCwJwWwgJP1NmqRmnp9RS2tGOBY0gQrfLCbB+OS5_KQ@mail.gmail.com>
	<CAAKRu_YS+Ocm=OzMaZnG4egFiE9v4VYfZ25DXd6jbwegqmGYbQ@mail.gmail.com>
	<CAAKRu_ZGZSqhGt-RcmmfiSheC+1fjQdxy6_+oM-1jMn8hyVptQ@mail.gmail.com>
	<CALdSSPg+B8RTzTXhJvCcKJBqgzhPZkq0E2oqxQdv74ZNZOMVzg@mail.gmail.com>
	<CAAKRu_Zha7HcdQBv8tTtQrcry5332J6kHnOc1X=TT03LzUXDow@mail.gmail.com>
	<CAAKRu_amF00f2T_H8N6pbbe75C22EeX1OqA=svpj8LNO1sdUuw@mail.gmail.com>
	<mhf4vkmh3j57zx7vuxp4jagtdzwhu3573pgfpmnjwqa6i6yj5y@sy4ymcdtdklo>

Thanks for the review! All the small changes you suggested I made in
attached v23 unless otherwise noted below.

On Mon, Nov 24, 2025 at 5:24 PM Andres Freund <[email protected]> wrote:
>
> On 2025-11-20 12:19:58 -0500, Melanie Plageman wrote:
> > Subject: [PATCH v22 1/9] Split heap_page_prune_and_freeze() into helpers
>
> One minor thing: It's slightly odd that prune_freeze_plan() gets an oid
> argument, prune_freeze_setup() gets the entire prstate,
> heap_page_will_freeze() gets the Relation. It's what they need, but still a
> bit odd.

They all get the PruneState actually.

I've committed this patch (but actually have to do a follow-on commit
to silence coverity. Will do that next.)

> > Subject: [PATCH v22 2/9] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
> >  prune/freeze
>
>
> Hm. This change makes sense, but unfortunately I find it somewhat hard to
> review. There are a lot of changes that don't obviously work towards one
> goal in this commit.

I've split up the first commit into 4 patches in attached v23
(0002-0005). They are not meant to be committed separately but are
separate only for ease of review. They comprise the logical steps for
getting to the final code state. I originally had it split up but got
feedback it was more work to review them each. So, let's see how this
goes.

> >@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
>
> >+     * vmbuffer is the buffer that must already contain contain the required
> >+     * block of the visibility map if we are to update it. blk_known_av is the
> >+     * visibility status of the heap block as of the last call to
> >+     * find_next_unskippable_block().
> >+     */
> >+    Buffer      vmbuffer;
> >+    bool        blk_known_av;
>
> What is blk_known_av set to if the block is known to not be all visible?
> Compared to the case where we did not yet determine the visibility status of
> the block?

blk_known_av should always be set to false if the caller doesn't know.
It is used as an optimization. I've added to the comment in this
struct to clarify that. More on this further down in my mail.

> > + * Decide whether to set the visibility map bits for heap_blk, using
> > + * information from PruneState and blk_known_av. Some callers may already
> > + * have examined this page’s VM bits (e.g., VACUUM in the previous
> > + * heap_vac_scan_next_block() call) and can pass that along.
>
> That's not entirely trivial to follow, tbh. As mentioned above, it's not clear
> to me how the state of a block where did determine that the block is *not*
> all-visible is represented.

There is no need to distinguish between knowing it is not all-visible
and not knowing if it is all-visible. That is, "not known" and "known
not" are the same for our purposes. This is only an optimization and
not needed for correctness. I've tried to add comments to this effect
in various places where blk_known_av is used.

> > +     else if (blk_known_av && !PageIsAllVisible(heap_page) &&
> > +                      visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
> > +     {
> > +             ereport(WARNING,
> > +                             (errcode(ERRCODE_DATA_CORRUPTED),
> > +                              errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
> > +                                             RelationGetRelationName(relation), heap_blk)));
> > +
> > +             visibilitymap_clear(relation, heap_blk, vmbuffer,
> > +                                                     VISIBILITYMAP_VALID_BITS);
>
> Wait, why is it ok to perform this check iff blk_known_av is set?

This is existing logic in vacuum. It would be okay to perform the
check even if blk_known_av is false but might be too expensive for the
common case where the page is not all-visible (especially on-access).
The next vacuum should be able to enter this code path and fix it. Or
do you think it will be cheap enough because the caller will have read
in and pinned the VM page?

> > +                     old_vmbits = visibilitymap_set_vmbits(blockno,
> > +                                                                                               vmbuffer, new_vmbits,
> > +                                                                                               params->relation->rd_locator);
> > +                     if (old_vmbits == new_vmbits)
> > +                     {
> > +                             LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
> > +                             /* Unset so we don't emit WAL since no change occurred */
> > +                             do_set_vm = false;
> > +                     }
> > +             }
>
> What can lead to this path being reached? Doesn't this mean that something
> changed the state of the VM while we were holding an exclusive lock on the
> heap buffer?

This shouldn't be in this commit (I've fixed that). However, it is
needed once we have on-access VM setting because we could have set the
page all-visible in the VM on-access in between when
find_next_unskippable_block() first checks the VM and sets
all_visible_according_to_vm/blk_known_av and when we take the lock and
prune/freeze the page.

> >                       log_heap_prune_and_freeze(params->relation, buffer,
> > -                                                                       InvalidBuffer,        /* vmbuffer */
> > -                                                                       0,    /* vmflags */
> > +                                                                       do_set_vm ? vmbuffer : InvalidBuffer,
> > +                                                                       do_set_vm ? new_vmbits : 0,
> >                                                                         conflict_xid,
> > -                                                                       true, params->reason,
> > +                                                                       true, /* cleanup lock */
> > +                                                                       do_set_pd_vis,
> > +                                                                       params->reason,
> >                                                                         prstate.frozen, prstate.nfrozen,
> >                                                                         prstate.redirected, prstate.nredirected,
> >                                                                         prstate.nowdead, prstate.ndead,
>
> This function is now taking 16 parameters :/

Is this complaint about readability or performance of parameter
passing? Because if it's the latter, I can't imagine that will be
noticeable when compared to the overhead of emitting a WAL record.

I could add a struct just for passing the parameters to the
log_heap_prune_and_freeze(). Something like:

typedef struct PruneFreezeChanges
{
    int            nredirected;
    int            ndead;
    int            nunused;
    int            nfrozen;
    OffsetNumber *redirected;
    OffsetNumber *nowdead;
    OffsetNumber *nowunused;
    HeapTupleFreeze *frozen;
} PruneFreezeChanges;

PruneFreezeChanges c = {
        .redirected = prstate.redirected,
        .nredirected = prstate.nredirected,
        .ndead = prstate.ndead,
        .nowdead = prstate.nowdead,
        .nunused = prstate.nunused,
        .nowunused = prstate.nowunused,
        .nfrozen = prstate.nfrozen,
        .frozen = prstate.frozen,
};

log_heap_prune_and_freeze(params->relation, buffer,
                                                        InvalidBuffer,
   /* vmbuffer */
                                                        0,    /* vmflags */
                                                        conflict_xid,
                                                        true, params->reason,
                                                        c);

However, I fear it is a bit confusing to have this struct just to pass
the parameters to the log_heap_prune_and_freeze(). We can't use that
struct inline in the PruneState because then we would need all the
arrays to be inline in the PruneFreezeChanges struct which would cause
4*MaxHeapTuplesPerPage stack allocated OffsetNumbers in vacuum phase
III than it currently has and needs.

The only other related parameters I see that could be stuck into a
struct are vmflags and set_pd_all_vis -- maybe called VisiChanges or
HeapPageVisiChanges. But again, I'm not sure if it is worth adding a
new struct for this.

> > +#ifdef USE_ASSERT_CHECKING
> > +     if (prstate.all_visible)
> > +     {
> > +             TransactionId debug_cutoff;
> > +             bool            debug_all_frozen;
> > +
> > +             Assert(prstate.lpdead_items == 0);
> > +             Assert(prstate.cutoffs);
> > +
> > +             if (!heap_page_is_all_visible(params->relation, buffer,
> > +                                                                       prstate.cutoffs->OldestXmin,
> > +                                                                       &debug_all_frozen,
> > +                                                                       &debug_cutoff, off_loc))
> > +                     Assert(false);
>
> I don't love Assert(false), because the message for the assert failure is
> pretty much meaningless. Sometimes it's hard to avoid, but here you have an if
> () that has no body other than Assert(false)? Just Assert the expression
> directly.

This is existing code. I agree it's weird, but I remember Peter saying
something about why he did it this way that I no longer remember.
Anyway, 0001 changes the assert as you suggest.

> > Subject: [PATCH v22 3/9] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
> >
> > As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
> > marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
>
> This whole business of treating empty pages as all-visible continues to not
> make any sense to me. Particularly in combination with a not crashsafe FSM it
> just seems ... unhelpful. It also means that there there's a decent chance of
> extra WAL when bulk extending. But that's not the fault of this change.

Is the argument for setting them av/af that we can skip them more
easily in future vacuums (i.e. not have to read in the page and take a
lock etc)?

> > Subject: [PATCH v22 4/9] Remove XLOG_HEAP2_VISIBLE entirely
> >
> > As no remaining users emit XLOG_HEAP2_VISIBLE records.
> > This includes deleting the xl_heap_visible struct and all functions
> > responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
>
> Probably worth mentioning that this changes the VM API.

I've added a mention about this in the commit.
Are you imagining I have any comments anywhere about how
XLOG_HEAP2_VISIBLE used to exist?

I realized I need to bump XLOG_PAGE_MAGIC in this commit because the
code to replay XLOG_HEAP2_VISIBLE records is gone now.

What I'm not sure is if I have to bump it in some of the other commits
that change which WAL records are emitted by a particular operation
(e.g. not emitting a separate VM record from phase I of vacuum).

> > - * - Page pruning, in VACUUM's 1st pass or on access: Some items are
> > + * - Page pruning, in vacuum phase I or on-access: Some items are
> >   *   redirected, some marked dead, and some removed altogether.
> >   *
> > - * - Freezing: Items are marked as 'frozen'.
> > + * - Freezing: During vacuum phase I, items are marked as 'frozen'
> >   *
> > - * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
> > + * - Reaping: During vacuum phase III, items that are already LP_DEAD are
> > + *   marked as unused.
> >   *
> > - * They have enough commonalities that we use a single WAL record for them
> > + * - VM updates: After vacuum phases I and III, the heap page may be marked
> > + *   all-visible and all-frozen.
> > + *
> > + * These changes all happen together, so we use a single WAL record for them
> >   * all.
> >   *
> >   * If replaying the record requires a cleanup lock, pass cleanup_lock =
> >   true.
>
> How's that related to the commit's subject?

Oops, I moved it to the relevant commit.

> > Subject: [PATCH v22 5/9] Rename GlobalVisTestIsRemovableXid() to
> >  GlobalVisXidVisibleToAll()
> >
> > The function is currently only used to check whether a tuple’s xmax is
> > visible to all transactions (and thus removable). Upcoming changes will
> > also use it to test whether a tuple’s xmin is visible to all to
> > decide if a page can be marked all-visible in the visibility map.
> >
> > The new name, GlobalVisXidVisibleToAll(), better reflects this broader
> > purpose.
>
> If we want this - and I'm not convinced we do - I think it needs to go further
> and change the other uses of removable in
> procarray.c. ComputeXidHorizonsResult has a lot of related fields.
>
> There's also GetOldestNonRemovableTransactionId(),
> GlobalVisCheckRemovableXid(), GlobalVisCheckRemovableFullXid() that weren't
> included in the renaming.

Okay, I see what you are saying. When you say you're not sure if we
want "this" -- do you mean using GlobalVisState for determining if
xmins are visible to all (which is required to set the VM on-access)
or do you mean renaming those functions?

If we're just talking about the renaming, looking at procarray.c, it
is full of the word "removable" because its functions were largely
used to examine and determine if everyone can see an xmax as committed
and thus if that tuple is removable from their perspective. But
nothing about the code that I can see means it has to be an xmax. We
could just as well use the functions to determine if everyone can see
an xmin as committed.

I don't see how we can leave the names as is and use it on xmins
because that tuple is _not_ removable based on testing if everyone can
see the xmin. So the function basically returns an incorrect result.

That being said, the problem with replacing "removable" with "visible
to all" -- which isn't _terrible_ -- means we have to replace
"nonremovable" with "not visible to all" -- which is terrible.

I think getting rid of "removable" from procarray.c would be an
improvement because that file feels tightly coupled to vacuum and
removing tuples because of the names of variables and functions when
actually its functionality isn't. So, the issue is coming up with
something palatable.

One alternative idea (that requires no renaming) is to add a wrapper
function somewhere outside procarray.c which invokes
GlobalVisTestIsRemovableXid() but is called something like
XidVisibleToAll() and is documented for use with xmins/etc. It would
avoid the messy work of coming up with a good name. What do you think?

> > Subject: [PATCH v22 6/9] Use GlobalVisState in vacuum to determine page level
> >  visibility
> >
> > This also benefits vacuum directly: in some cases, GlobalVisState may
> > advance during a vacuum, allowing more pages to become considered
> > all-visible. And, in the future, we could easily add a heuristic to
> > update GlobalVisState more frequently during vacuums of large tables. In
> > the rare case that the GlobalVisState moves backward, vacuum falls back
> > to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
> > wasn’t yet prunable according to the GlobalVisState.
>
> I think it may be better to make sure that the GlobalVisState can't move
> backward.

Do you mean that I shouldn't use the GlobalVisState to determine
visibility until I make sure it can't move backwards?

There is actually no functional difference in my patch set with the
code this commit message refers to (in heap_prune_satisfies_vacuum()).
I only mentioned it to make sure folks knew that even though I was
widening usage of GlobalVisState, we wouldn't encounter
synchronization issues with freezing horizons.

> > Subject: [PATCH v22 8/9] Allow on-access pruning to set pages all-visible
> >
> > Many queries do not modify the underlying relation. For such queries, if
> > on-access pruning occurs during the scan, we can check whether the page
> > has become all-visible and update the visibility map accordingly.
> > Previously, only vacuum and COPY FREEZE marked pages as all-visible or
> > all-frozen.
>
> > Supporting this requires passing information about whether the relation
> > is modified from the executor down to the scan descriptor.
>
> I think it'd be good to split this part into a separate commit. The set of
> folks to review that are distinct (and broader) from the ones looking at
> heapam internals.

Good point. I've split it into 3 commits in this patch set (0011-0013)

- Melanie


Attachments:

  [text/x-patch] v23-0001-Simplify-vacuum-visibility-assertion.patch (1.4K, 2-v23-0001-Simplify-vacuum-visibility-assertion.patch)
  download | inline diff:
From 7d51aaf9fea35367e36d143828412727a44d63d6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 10:42:53 -0500
Subject: [PATCH v23 01/14] Simplify vacuum visibility assertion

Phase I vacuum gives the page a once-over after pruning and freezing to
check that the values of all_visible and all_frozen agree with the
result of heap_page_is_all_visible(). This is meant to keep the logic in
phase I for determining visibility in sync with the logic in phase III.

Rewrite the assertion to avoid an Assert(false).

Suggested by Andres Freund.
---
 src/backend/access/heap/vacuumlazy.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 65bb0568a86..984d5879947 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2028,10 +2028,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 		Assert(presult.lpdead_items == 0);
 
-		if (!heap_page_is_all_visible(vacrel->rel, buf,
-									  vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-									  &debug_cutoff, &vacrel->offnum))
-			Assert(false);
+		Assert(heap_page_is_all_visible(vacrel->rel, buf,
+										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.all_frozen == debug_all_frozen);
 
-- 
2.43.0



  [text/x-patch] v23-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patch (12.3K, 3-v23-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patch)
  download | inline diff:
From 7023583962f987cfde5450c8a2142574bb3ce84d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v23 02/14] Refactor lazy_scan_prune() VM set logic into helper

This commit is meant for ease of review only. It is a step towards
setting the VM in the same record as pruning and freezing in phase I of
vacuum. It isn't meant to be committed alone because it widens an
undesirable case where a heap buffer not marked dirty is stamped with an
LSN. If PD_ALL_VISIBLE is already set but the VM is not set, we won't
mark it dirty and then if checksums are enabled we will still stamp the
heap page LSN on a page not marked dirty.

Once the VM update is done in the same WAL record as pruning/freezing,
we will only set the LSN on the heap page if we set PD_ALL_VISIBLE or
made other heap page modifications.
---
 src/backend/access/heap/vacuumlazy.c | 283 ++++++++++++++-------------
 1 file changed, 146 insertions(+), 137 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 984d5879947..1cca095841e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1934,6 +1934,117 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
+
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneFreezeResult and all_visible_according_to_vm. This
+ * function does not actually set the VM bit or page-level hint,
+ * PD_ALL_VISIBLE.
+ *
+ * If it finds that the page-level visibility hint or VM is corrupted, it will
+ * fix them by clearing the VM bit and page hint. This does not need to be
+ * done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
+ * PD_ALL_VISIBLE should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+					   BlockNumber heap_blk,
+					   Buffer heap_buf,
+					   Buffer vmbuffer,
+					   bool all_visible_according_to_vm,
+					   const PruneFreezeResult *presult,
+					   uint8 *new_vmbits,
+					   bool *do_set_pd_vis)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+
+	*new_vmbits = 0;
+
+	/*
+	 * It should never be the case that the visibility map page is set while
+	 * the page-level bit is clear, but the reverse is allowed (if checksums
+	 * are not enabled).
+	 *
+	 * We avoid relying on all_visible_according_to_vm as a proxy for the
+	 * page-level PD_ALL_VISIBLE bit being set, since it might have become
+	 * stale.
+	 */
+	*do_set_pd_vis = presult->all_visible & !PageIsAllVisible(heap_page);
+
+	/*
+	 * Determine what to set the visibility map bits to based on information
+	 * from the VM (as of last heap_vac_scan_next_block() call), and from
+	 * all_visible and all_frozen variables.
+	 */
+	if ((presult->all_visible && !all_visible_according_to_vm) ||
+		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+		if (presult->all_frozen)
+		{
+			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+		}
+
+		return true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return false;
+}
+
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -1964,6 +2075,10 @@ lazy_scan_prune(LVRelState *vacrel,
 				bool *vm_page_frozen)
 {
 	Relation	rel = vacrel->rel;
+	bool		do_set_vm = false;
+	bool		do_set_pd_vis = false;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 	PruneFreezeResult presult;
 	PruneFreezeParams params = {
 		.relation = rel,
@@ -2075,28 +2190,22 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 	Assert(!presult.all_frozen || presult.all_visible);
 
-	/*
-	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last heap_vac_scan_next_block() call), and from all_visible and
-	 * all_frozen variables
-	 */
-	if (!all_visible_according_to_vm && presult.all_visible)
-	{
-		uint8		old_vmbits;
-		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
+	do_set_vm = heap_page_will_set_vis(rel,
+									   blkno,
+									   buf,
+									   vmbuffer,
+									   all_visible_according_to_vm,
+									   &presult,
+									   &new_vmbits,
+									   &do_set_pd_vis);
 
-		if (presult.all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
-		}
 
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+	if (do_set_pd_vis)
+	{
 		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
 		 * NB: If the heap page is all-visible but the VM bit is not set, we
 		 * don't need to dirty the heap page.  However, if checksums are
 		 * enabled, we do need to make sure that the heap page is dirtied
@@ -2104,136 +2213,36 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * Given that this situation should only happen in rare cases after a
 		 * crash, it is not worth optimizing.
 		 */
-		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
+		PageSetAllVisible(page);
+	}
+
+	if (do_set_vm)
 		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
 									   InvalidXLogRecPtr,
 									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
-
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (presult.all_frozen)
-			{
-				vacrel->vm_new_visible_frozen_pages++;
-				*vm_page_frozen = true;
-			}
-		}
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-				 presult.all_frozen)
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
+									   new_vmbits);
 
 	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
+	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						vacrel->relname, blkno)));
-
-		PageClearAllVisible(page);
-		MarkBufferDirty(buf);
-		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * If the all-visible page is all-frozen but not marked as such yet, mark
-	 * it as all-frozen.
-	 */
-	else if (all_visible_according_to_vm && presult.all_frozen &&
-			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-	{
-		uint8		old_vmbits;
-
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
-		/*
-		 * Set the page all-frozen (and all-visible) in the VM.
-		 *
-		 * We can pass InvalidTransactionId as our cutoff_xid, since a
-		 * snapshotConflictHorizon sufficient to make everything safe for REDO
-		 * was logged when the page's tuples were frozen.
-		 */
-		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
-
-		/*
-		 * The page was likely already set all-visible in the VM. However,
-		 * there is a small chance that it was modified sometime between
-		 * setting all_visible_according_to_vm and checking the visibility
-		 * during pruning. Check the return value of old_vmbits anyway to
-		 * ensure the visibility map counters used for logging are accurate.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		vacrel->vm_new_visible_pages++;
+		if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
-			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
-
-		/*
-		 * We already checked that the page was not set all-frozen in the VM
-		 * above, so we don't need to test the value of old_vmbits.
-		 */
-		else
-		{
-			vacrel->vm_new_frozen_pages++;
-			*vm_page_frozen = true;
-		}
+	}
+	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	{
+		Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		vacrel->vm_new_frozen_pages++;
+		*vm_page_frozen = true;
 	}
 
 	return presult.ndeleted;
-- 
2.43.0



  [text/x-patch] v23-0003-Set-the-VM-in-prune-code.patch (26.0K, 4-v23-0003-Set-the-VM-in-prune-code.patch)
  download | inline diff:
From 40b506a888ef57f5b962b320b817b97e64c9c4c0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v23 03/14] Set the VM in prune code

For review only, this moves the code to set the VM into
heap_page_prune_and_freeze() as a step toward having it in the same WAL
record.
---
 src/backend/access/heap/pruneheap.c  | 281 ++++++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c | 166 +---------------
 src/include/access/heapam.h          |  27 +++
 3 files changed, 272 insertions(+), 202 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5af84b4c875..0daf3abf717 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether or not to attempt updating the VM */
+	bool		attempt_update_vm;
 	struct VacuumCutoffs *cutoffs;
 
 	/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * can be used as the conflict horizon when setting the VM or when
+	 * freezing all the tuples on the page. It is only valid when all the live
+	 * tuples on the page are all-visible.
 	 *
 	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
 	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * decide whether to opportunistically freeze the page or not.  The
+	 * all_visible and all_frozen values ultimately used to set the VM are
+	 * adjusted to include LP_DEAD items after we determine whether or not to
+	 * opportunistically freeze.
 	 */
 	bool		all_visible;
 	bool		all_frozen;
@@ -191,6 +194,14 @@ static void page_verify_redirects(Page page);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vis(Relation relation,
+								   BlockNumber heap_blk,
+								   Buffer heap_buf,
+								   Buffer vmbuffer,
+								   bool blk_known_av,
+								   const PruneFreezeResult *presult,
+								   uint8 *new_vmbits,
+								   bool *do_set_pd_vis);
 
 
 /*
@@ -280,6 +291,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			PruneFreezeParams params = {
 				.relation = relation,
 				.buffer = buffer,
+				.vmbuffer = InvalidBuffer,
+				.blk_known_av = false,
 				.reason = PRUNE_ON_ACCESS,
 				.options = 0,
 				.vistest = vistest,
@@ -338,6 +351,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_update_vm =
+		(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
 	prstate->cutoffs = params->cutoffs;
 
 	/*
@@ -386,51 +401,54 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
+	 * Track whether the page could be marked all-visible and/or all-frozen.
+	 * This information is used for opportunistic freezing and for updating
+	 * the visibility map (VM) if requested by the caller.
 	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Visibility bookkeeping is required not just for setting the VM
+	 * bits, but also for opportunistic freezing: we only consider freezing if
+	 * the page would become all-frozen, or if it would be all-frozen except
+	 * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+	 * we will not set the VM bit even if the page is found to be all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
-	 * that after scanning the line pointers. We must correct all_visible and
-	 * all_frozen before we return them to the caller, so that the caller
-	 * doesn't set the VM bits incorrectly.
+	 * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+	 * prstate.all_frozen must be initialized to false, since we will not call
+	 * heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear all_visible
+	 * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+	 * them after deciding whether to freeze, but before updating the VM, to
+	 * avoid setting the VM bits incorrectly.
+	 *
+	 * If neither freezing nor VM updates are requested, we skip the extra
+	 * bookkeeping. In this case, initializing all_visible to false allows
+	 * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
 	 */
 	if (prstate->attempt_freeze)
 	{
 		prstate->all_visible = true;
 		prstate->all_frozen = true;
 	}
+	else if (prstate->attempt_update_vm)
+	{
+		prstate->all_visible = true;
+		prstate->all_frozen = false;
+	}
 	else
 	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
 		prstate->all_visible = false;
 		prstate->all_frozen = false;
 	}
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * The visibility cutoff xid is the newest xmin of live, committed tuples
+	 * older than OldestXmin on the page. This field is only kept up-to-date
+	 * if the page is all-visible. As soon as a tuple is encountered that is
+	 * not visible to all, this field is unmaintained. As long as it is
+	 * maintained, it can be used to calculate the snapshot conflict horizon
+	 * when updating the VM and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -765,10 +783,131 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneFreezeResult and blk_known_av. Some callers may
+ * already have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along as blk_known_av.
+ * Callers that have not previously checked the page's status in the VM should
+ * pass false for blk_known_av.
+ *
+ * This function does not actually set the VM bit or page-level hint,
+ * PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bit and page hint. This does
+ * not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
+ * PD_ALL_VISIBLE should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+					   BlockNumber heap_blk,
+					   Buffer heap_buf,
+					   Buffer vmbuffer,
+					   bool blk_known_av,
+					   const PruneFreezeResult *presult,
+					   uint8 *new_vmbits,
+					   bool *do_set_pd_vis)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+
+	*new_vmbits = 0;
+
+	/*
+	 * It should never be the case that the visibility map page is set while
+	 * the page-level bit is clear, but the reverse is allowed (if checksums
+	 * are not enabled).
+	 *
+	 * We avoid relying on blk_known_av as a proxy for the page-level
+	 * PD_ALL_VISIBLE bit being set, since it might have become stale and may
+	 * not be provided by all callers.
+	 */
+	*do_set_pd_vis = presult->all_visible & !PageIsAllVisible(heap_page);
+
+	/*
+	 * Determine what the visibility map bits should be set to using the
+	 * values of all_visible and all_frozen determined during
+	 * pruning/freezing.
+	 */
+	if ((presult->all_visible && !blk_known_av) ||
+		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	{
+		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+		if (presult->all_frozen)
+		{
+			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+		}
+
+		return true;
+	}
+
+	/*
+	 * Now handle two potential corruption cases:
+	 *
+	 * These do not need to happen in a critical section and are not
+	 * WAL-logged.
+	 *
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
+	 * with buffer lock before concluding that the VM is corrupt.
+	 *
+	 * Callers which did not check the visibility map and determine
+	 * blk_known_av will not be eligible for this, however the cost of
+	 * potentially needing to read the visibility map for pages that are not
+	 * all-visible is too high to justify generalizing the check.
+	 */
+	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	{
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+						RelationGetRelationName(relation), heap_blk)));
+
+		PageClearAllVisible(heap_page);
+		MarkBufferDirty(heap_buf);
+		visibilitymap_clear(relation, heap_blk, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	return false;
+}
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -783,12 +922,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -813,11 +953,15 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   MultiXactId *new_relmin_mxid)
 {
 	Buffer		buffer = params->buffer;
+	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
+	BlockNumber blockno = BufferGetBlockNumber(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
+	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
@@ -1005,6 +1149,51 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
 		}
 	}
+
+	/* Now update the visibility map and PD_ALL_VISIBLE hint */
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = false;
+	if (prstate.attempt_update_vm)
+		do_set_vm = heap_page_will_set_vis(params->relation,
+										   blockno,
+										   buffer,
+										   vmbuffer,
+										   params->blk_known_av,
+										   presult,
+										   &presult->new_vmbits,
+										   &do_set_pd_vis);
+
+
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || presult->new_vmbits == 0);
+
+	if (do_set_pd_vis)
+	{
+		/*
+		 * NB: If the heap page is all-visible but the VM bit is not set, we
+		 * don't need to dirty the heap page.  However, if checksums are
+		 * enabled, we do need to make sure that the heap page is dirtied
+		 * before passing it to visibilitymap_set(), because it may be logged.
+		 * Given that this situation should only happen in rare cases after a
+		 * crash, it is not worth optimizing.
+		 */
+		MarkBufferDirty(buffer);
+		PageSetAllVisible(page);
+	}
+
+	presult->old_vmbits = 0;
+	if (do_set_vm)
+		presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
+												InvalidXLogRecPtr,
+												vmbuffer, presult->vm_conflict_horizon,
+												presult->new_vmbits);
 }
 
 
@@ -1479,6 +1668,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 			{
 				TransactionId xmin;
 
+				Assert(prstate->attempt_update_vm);
+
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
 					prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1cca095841e..f5617335cb2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1935,116 +1935,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 }
 
 
-/*
- * Decide whether to set the visibility map bits for heap_blk, using
- * information from PruneFreezeResult and all_visible_according_to_vm. This
- * function does not actually set the VM bit or page-level hint,
- * PD_ALL_VISIBLE.
- *
- * If it finds that the page-level visibility hint or VM is corrupted, it will
- * fix them by clearing the VM bit and page hint. This does not need to be
- * done in a critical section.
- *
- * Returns true if one or both VM bits should be set, along with the desired
- * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
- * PD_ALL_VISIBLE should be set on the heap page.
- */
-static bool
-heap_page_will_set_vis(Relation relation,
-					   BlockNumber heap_blk,
-					   Buffer heap_buf,
-					   Buffer vmbuffer,
-					   bool all_visible_according_to_vm,
-					   const PruneFreezeResult *presult,
-					   uint8 *new_vmbits,
-					   bool *do_set_pd_vis)
-{
-	Page		heap_page = BufferGetPage(heap_buf);
-
-	*new_vmbits = 0;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear, but the reverse is allowed (if checksums
-	 * are not enabled).
-	 *
-	 * We avoid relying on all_visible_according_to_vm as a proxy for the
-	 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-	 * stale.
-	 */
-	*do_set_pd_vis = presult->all_visible & !PageIsAllVisible(heap_page);
-
-	/*
-	 * Determine what to set the visibility map bits to based on information
-	 * from the VM (as of last heap_vac_scan_next_block() call), and from
-	 * all_visible and all_frozen variables.
-	 */
-	if ((presult->all_visible && !all_visible_according_to_vm) ||
-		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
-	{
-		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-		if (presult->all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
-			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-		}
-
-		return true;
-	}
-
-	/*
-	 * Now handle two potential corruption cases:
-	 *
-	 * These do not need to happen in a critical section and are not
-	 * WAL-logged.
-	 *
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
-			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(relation), heap_blk)));
-
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(relation), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buf);
-		visibilitymap_clear(relation, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-	}
-
-	return false;
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2075,16 +1965,14 @@ lazy_scan_prune(LVRelState *vacrel,
 				bool *vm_page_frozen)
 {
 	Relation	rel = vacrel->rel;
-	bool		do_set_vm = false;
-	bool		do_set_pd_vis = false;
-	uint8		new_vmbits = 0;
-	uint8		old_vmbits = 0;
 	PruneFreezeResult presult;
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
+		.blk_known_av = all_visible_according_to_vm,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
@@ -2187,60 +2075,24 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
-	do_set_vm = heap_page_will_set_vis(rel,
-									   blkno,
-									   buf,
-									   vmbuffer,
-									   all_visible_according_to_vm,
-									   &presult,
-									   &new_vmbits,
-									   &do_set_pd_vis);
-
-
-	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
-	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
-
-	if (do_set_pd_vis)
-	{
-		/*
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		MarkBufferDirty(buf);
-		PageSetAllVisible(page);
-	}
-
-	if (do_set_vm)
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   new_vmbits);
-
 	/*
 	 * For the purposes of logging, count whether or not the page was newly
 	 * set all-visible and, potentially, all-frozen.
 	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
-		(new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+	if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
 		vacrel->vm_new_visible_pages++;
-		if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+		if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 		{
 			vacrel->vm_new_visible_frozen_pages++;
 			*vm_page_frozen = true;
 		}
 	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+	else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+			 (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
 	{
-		Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+		Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
 		vacrel->vm_new_frozen_pages++;
 		*vm_page_frozen = true;
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..ce9cfbdc767 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS			(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,18 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * vmbuffer is the buffer that must already contain the required block of
+	 * the visibility map if we are to update it. blk_known_av is the
+	 * visibility status of the heap block as of the last call to
+	 * find_next_unskippable_block(). Callers which did not check the
+	 * visibility map already should pass false for blk_known_av. This is only
+	 * an optimization for callers that did check the VM and won't affect
+	 * correctness.
+	 */
+	Buffer		vmbuffer;
+	bool		blk_known_av;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -252,6 +265,9 @@ typedef struct PruneFreezeParams
 	 *
 	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
 	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 *
+	 * HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status
+	 * in the VM.
 	 */
 	int			options;
 
@@ -299,6 +315,17 @@ typedef struct PruneFreezeResult
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * old_vmbits are the state of the all-visible and all-frozen bits in the
+	 * visibility map before updating it during phase I of vacuuming.
+	 * new_vmbits are the state of those bits after phase I of vacuuming.
+	 *
+	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+	 * we have attempted to update the VM.
+	 */
+	uint8		new_vmbits;
+	uint8		old_vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v23-0004-Move-VM-assert-into-prune-freeze-code.patch (14.8K, 5-v23-0004-Move-VM-assert-into-prune-freeze-code.patch)
  download | inline diff:
From 9aa0ec2b5fae04762128fbec329a23139fb5b4a4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v23 04/14] Move VM assert into prune/freeze code

For review only, this commit moves the check of the heap page into
prune/freeze code before setting the VM. This allows us to remove some
fields of the PruneFreezeResult.

This will get squashed into a larger commit to set the VM in the same
record where we prune and freeze.
---
 src/backend/access/heap/pruneheap.c  | 142 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c |  68 +------------
 src/include/access/heapam.h          |  25 ++---
 3 files changed, 111 insertions(+), 124 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0daf3abf717..2512b5d83e3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -199,7 +199,7 @@ static bool heap_page_will_set_vis(Relation relation,
 								   Buffer heap_buf,
 								   Buffer vmbuffer,
 								   bool blk_known_av,
-								   const PruneFreezeResult *presult,
+								   const PruneState *prstate,
 								   uint8 *new_vmbits,
 								   bool *do_set_pd_vis);
 
@@ -785,8 +785,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 
 /*
  * Decide whether to set the visibility map bits for heap_blk, using
- * information from PruneFreezeResult and blk_known_av. Some callers may
- * already have examined this page’s VM bits (e.g., VACUUM in the previous
+ * information from PruneState and blk_known_av. Some callers may already have
+ * examined this page’s VM bits (e.g., VACUUM in the previous
  * heap_vac_scan_next_block() call) and can pass that along as blk_known_av.
  * Callers that have not previously checked the page's status in the VM should
  * pass false for blk_known_av.
@@ -808,13 +808,20 @@ heap_page_will_set_vis(Relation relation,
 					   Buffer heap_buf,
 					   Buffer vmbuffer,
 					   bool blk_known_av,
-					   const PruneFreezeResult *presult,
+					   const PruneState *prstate,
 					   uint8 *new_vmbits,
 					   bool *do_set_pd_vis)
 {
 	Page		heap_page = BufferGetPage(heap_buf);
 
 	*new_vmbits = 0;
+	*do_set_pd_vis = false;
+
+	if (!prstate->attempt_update_vm)
+	{
+		Assert(!prstate->all_visible && !prstate->all_frozen);
+		return false;
+	}
 
 	/*
 	 * It should never be the case that the visibility map page is set while
@@ -825,22 +832,19 @@ heap_page_will_set_vis(Relation relation,
 	 * PD_ALL_VISIBLE bit being set, since it might have become stale and may
 	 * not be provided by all callers.
 	 */
-	*do_set_pd_vis = presult->all_visible & !PageIsAllVisible(heap_page);
+	*do_set_pd_vis = prstate->all_visible & !PageIsAllVisible(heap_page);
 
 	/*
 	 * Determine what the visibility map bits should be set to using the
 	 * values of all_visible and all_frozen determined during
 	 * pruning/freezing.
 	 */
-	if ((presult->all_visible && !blk_known_av) ||
-		(presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+	if ((prstate->all_visible && !blk_known_av) ||
+		(prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
 	{
 		*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-		if (presult->all_frozen)
-		{
-			Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+		if (prstate->all_frozen)
 			*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-		}
 
 		return true;
 	}
@@ -887,7 +891,7 @@ heap_page_will_set_vis(Relation relation,
 	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
 	 * however.
 	 */
-	else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+	else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
 	{
 		ereport(WARNING,
 				(errcode(ERRCODE_DATA_CORRUPTED),
@@ -903,6 +907,30 @@ heap_page_will_set_vis(Relation relation,
 	return false;
 }
 
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+						 TransactionId OldestXmin,
+						 bool *all_frozen,
+						 TransactionId *visibility_cutoff_xid,
+						 OffsetNumber *logging_offnum)
+{
+
+	return heap_page_would_be_all_visible(rel, buf,
+										  OldestXmin,
+										  NULL, 0,
+										  all_frozen,
+										  visibility_cutoff_xid,
+										  logging_offnum);
+}
+#endif
+
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -956,6 +984,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
+	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -1116,23 +1145,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
 	presult->hastup = prstate.hastup;
 
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
@@ -1150,20 +1164,68 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		}
 	}
 
-	/* Now update the visibility map and PD_ALL_VISIBLE hint */
-	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+	/*
+	 * If updating the visibility map, the conflict horizon for that record
+	 * must be the newest xmin on the page.  However, if the page is
+	 * completely frozen, there can be no conflict and the vm_conflict_horizon
+	 * should remain InvalidTransactionId.  This includes the case that we
+	 * just froze all the tuples; the prune-freeze record included the
+	 * conflict XID already so we don't need to again.
+	 */
+	if (prstate.all_frozen)
+		vm_conflict_horizon = InvalidTransactionId;
+	else
+		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(presult->lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(params->relation, buffer,
+										prstate.cutoffs->OldestXmin,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
 
-	do_set_vm = false;
-	if (prstate.attempt_update_vm)
-		do_set_vm = heap_page_will_set_vis(params->relation,
-										   blockno,
-										   buffer,
-										   vmbuffer,
-										   params->blk_known_av,
-										   presult,
-										   &presult->new_vmbits,
-										   &do_set_pd_vis);
+		Assert(prstate.all_frozen == debug_all_frozen);
 
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == vm_conflict_horizon);
+	}
+#endif
+
+	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+	 * based on information from the VM and the all_visible/all_frozen flags.
+	 *
+	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+	 * VM bit is clear, we strongly prefer to keep them in sync.
+	 *
+	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+	 * already been set. Setting only the VM is most common when setting an
+	 * already all-visible page all-frozen.
+	 */
+	do_set_vm = heap_page_will_set_vis(params->relation,
+									   blockno,
+									   buffer,
+									   vmbuffer,
+									   params->blk_known_av,
+									   &prstate,
+									   &presult->new_vmbits,
+									   &do_set_pd_vis);
 
 	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
 	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
@@ -1192,7 +1254,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (do_set_vm)
 		presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
 												InvalidXLogRecPtr,
-												vmbuffer, presult->vm_conflict_horizon,
+												vmbuffer, vm_conflict_horizon,
 												presult->new_vmbits);
 }
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f5617335cb2..4aa425ec945 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,20 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
-									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
-									 OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
-										   OffsetNumber *deadoffsets,
-										   int ndeadoffsets,
-										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
-										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
 static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2016,32 +2002,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -3496,29 +3456,6 @@ dead_items_cleanup(LVRelState *vacrel)
 	vacrel->pvs = NULL;
 }
 
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
-						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
-						 OffsetNumber *logging_offnum)
-{
-
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
-										  NULL, 0,
-										  all_frozen,
-										  visibility_cutoff_xid,
-										  logging_offnum);
-}
-#endif
 
 /*
  * Check whether the heap page in buf is all-visible except for the dead
@@ -3542,15 +3479,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
  * This logic is closely related to heap_prune_record_unchanged_lp_normal().
  * If you modify this function, ensure consistency with that code. An
  * assertion cross-checks that both remain in agreement. Do not introduce new
  * side-effects.
  */
-static bool
+bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   TransactionId OldestXmin,
 							   OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ce9cfbdc767..b20096b6ca1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -263,8 +263,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
 	 *
 	 * HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status
 	 * in the VM.
@@ -300,21 +299,6 @@ typedef struct PruneFreezeResult
 	int			live_tuples;
 	int			recently_dead_tuples;
 
-	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
-	 *
-	 * These are only set if the HEAP_PRUNE_FREEZE option is set.
-	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
-
 	/*
 	 * old_vmbits are the state of the all-visible and all-frozen bits in the
 	 * visibility map before updating it during phase I of vacuuming.
@@ -460,6 +444,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+										   TransactionId OldestXmin,
+										   OffsetNumber *deadoffsets,
+										   int ndeadoffsets,
+										   bool *all_frozen,
+										   TransactionId *visibility_cutoff_xid,
+										   OffsetNumber *logging_offnum);
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v23-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (21.4K, 6-v23-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From a407673cb2632d4544cc56458dbf4a063da2067c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v23 05/14] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

NOTE: This commit is the main commit and all review-only commits
preceding it will be squashed into it.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam_xlog.c |  48 +++--
 src/backend/access/heap/pruneheap.c   | 294 +++++++++++++++-----------
 src/backend/access/heap/vacuumlazy.c  |   1 +
 src/include/access/heapam.h           |   1 +
 4 files changed, 212 insertions(+), 132 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11cb3f74da5..b1ceab71928 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -104,6 +104,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		OffsetNumber *frz_offsets;
 		char	   *dataptr = XLogRecGetBlockData(record, 0, &datalen);
 		bool		do_prune;
+		bool		set_lsn = false;
+		bool		mark_buffer_dirty = false;
 
 		heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
 											   &nplans, &plans, &frz_offsets,
@@ -157,17 +159,39 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		/* There should be no more data */
 		Assert((char *) frz_offsets == dataptr + datalen);
 
-		if (vmflags & VISIBILITYMAP_VALID_BITS)
-			PageSetAllVisible(page);
-
-		MarkBufferDirty(buffer);
+		if (do_prune || nplans > 0)
+			mark_buffer_dirty = set_lsn = true;
 
 		/*
-		 * See log_heap_prune_and_freeze() for commentary on when we set the
-		 * heap page LSN.
+		 * The critical integrity requirement here is that we must never end
+		 * up with the visibility map bit set and the page-level
+		 * PD_ALL_VISIBLE bit unset.  If that were to occur, a subsequent page
+		 * modification would fail to clear the visibility map bit.
+		 *
+		 * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
+		 * marking an all-visible page all-frozen). If only the VM is updated,
+		 * the heap page need not be dirtied.
 		 */
-		if (do_prune || nplans > 0 ||
-			((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			mark_buffer_dirty = true;
+
+			/*
+			 * See log_heap_prune_and_freeze() for commentary on when we set
+			 * the heap page LSN.
+			 */
+			if (XLogHintBitIsNeeded())
+				set_lsn = true;
+		}
+
+		/* We should always mark a buffer dirty before stamping with an LSN */
+		Assert(!set_lsn || mark_buffer_dirty);
+
+		if (mark_buffer_dirty)
+			MarkBufferDirty(buffer);
+
+		if (set_lsn)
 			PageSetLSN(page, lsn);
 
 		/*
@@ -246,10 +270,10 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 /*
  * Replay XLOG_HEAP2_VISIBLE records.
  *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
+ * The critical integrity requirement here is that we must never end up with a
+ * situation where the visibility map bit is set, and the page-level
+ * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent page
+ * modification would fail to clear the visibility map bit.
  */
 static void
 heap_xlog_visible(XLogReaderState *record)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2512b5d83e3..b851d723c74 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -194,6 +194,12 @@ static void page_verify_redirects(Page page);
 static bool heap_page_will_freeze(Relation relation, Buffer buffer,
 								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  uint8 new_vmbits,
+									  TransactionId latest_xid_removed,
+									  TransactionId frz_conflict_horizon,
+									  TransactionId visibility_cutoff_xid,
+									  bool blk_already_av);
 static bool heap_page_will_set_vis(Relation relation,
 								   BlockNumber heap_blk,
 								   Buffer heap_buf,
@@ -783,6 +789,64 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 	return do_freeze;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm, uint8 new_vmbits,
+				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+				 TransactionId visibility_cutoff_xid, bool blk_already_av)
+{
+	TransactionId conflict_xid;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be visible to all MVCC snapshots on the standby.
+	 */
+	if (!do_prune && !do_freeze &&
+		do_set_vm && blk_already_av && (new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications.  If this record will prune tuples, any transactions on
+	 * the standby older than the youngest xmax of the most recently removed
+	 * tuple this record will prune will conflict.  If this record will freeze
+	 * tuples, any transactions on the standby with xids older than the
+	 * youngest tuple this record will freeze will conflict.
+	 */
+	conflict_xid = InvalidTransactionId;
+
+	/*
+	 * If we are updating the VM, the conflict horizon is almost always the
+	 * visibility cutoff XID.
+	 *
+	 * Separately, if we are freezing any tuples, as an optimization, we can
+	 * use the visibility_cutoff_xid as the conflict horizon if the page will
+	 * be all-frozen. This is true even if there are LP_DEAD line pointers
+	 * because we ignored those when maintaining the visibility_cutoff_xid.
+	 * This will have been calculated earlier as the frz_conflict_horizon when
+	 * we determined we would freeze.
+	 */
+	if (do_set_vm)
+		conflict_xid = visibility_cutoff_xid;
+	else if (do_freeze)
+		conflict_xid = frz_conflict_horizon;
+
+	/*
+	 * If we are removing tuples with a younger xmax than our so far
+	 * calculated conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+		conflict_xid = latest_xid_removed;
+
+	return conflict_xid;
+}
+
 /*
  * Decide whether to set the visibility map bits for heap_blk, using
  * information from PruneState and blk_known_av. Some callers may already have
@@ -984,7 +1048,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Buffer		vmbuffer = params->vmbuffer;
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	TransactionId vm_conflict_horizon = InvalidTransactionId;
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -993,6 +1056,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_set_pd_vis;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
+	uint8		new_vmbits = 0;
+	uint8		old_vmbits = 0;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1058,6 +1124,39 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.all_visible = prstate.all_frozen = false;
 
 	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+	/*
+	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+	 * based on information from the VM and the all_visible/all_frozen flags.
+	 *
+	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+	 * VM bit is clear, we strongly prefer to keep them in sync.
+	 *
+	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+	 * already been set. Setting only the VM is most common when setting an
+	 * already all-visible page all-frozen.
+	 */
+	do_set_vm = heap_page_will_set_vis(params->relation,
+									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   &prstate, &new_vmbits, &do_set_pd_vis);
+
+	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || new_vmbits == 0);
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm, new_vmbits,
+									prstate.latest_xid_removed, prstate.frz_conflict_horizon,
+									prstate.visibility_cutoff_xid, params->blk_known_av);
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1079,14 +1178,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_pd_vis)
 			MarkBufferDirtyHint(buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1100,36 +1202,33 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		if (do_set_pd_vis)
+			PageSetAllVisible(page);
+
+		if (do_prune || do_freeze || do_set_pd_vis)
+			MarkBufferDirty(buffer);
+
+		if (do_set_vm)
+		{
+			Assert(PageIsAllVisible(page));
+			old_vmbits = visibilitymap_set_vmbits(blockno,
+												  vmbuffer, new_vmbits,
+												  params->relation->rd_locator);
+			Assert(old_vmbits != new_vmbits);
+		}
 
 		/*
 		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
 		 */
 		if (RelationNeedsWAL(params->relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(params->relation, buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? vmbuffer : InvalidBuffer,
+									  do_set_vm ? new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  do_set_pd_vis,
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1139,43 +1238,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
-	/* Copy information back for caller */
-	presult->ndeleted = prstate.ndeleted;
-	presult->nnewlpdead = prstate.ndead;
-	presult->nfrozen = prstate.nfrozen;
-	presult->live_tuples = prstate.live_tuples;
-	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->hastup = prstate.hastup;
-
-	presult->lpdead_items = prstate.lpdead_items;
-	/* the presult->deadoffsets array was already filled in */
-
-	if (prstate.attempt_freeze)
-	{
-		if (presult->nfrozen > 0)
-		{
-			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
-		}
-		else
-		{
-			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
-			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
-		}
-	}
-
-	/*
-	 * If updating the visibility map, the conflict horizon for that record
-	 * must be the newest xmin on the page.  However, if the page is
-	 * completely frozen, there can be no conflict and the vm_conflict_horizon
-	 * should remain InvalidTransactionId.  This includes the case that we
-	 * just froze all the tuples; the prune-freeze record included the
-	 * conflict XID already so we don't need to again.
-	 */
-	if (prstate.all_frozen)
-		vm_conflict_horizon = InvalidTransactionId;
-	else
-		vm_conflict_horizon = prstate.visibility_cutoff_xid;
+	if (do_set_vm)
+		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
 
 	/*
 	 * During its second pass over the heap, VACUUM calls
@@ -1190,7 +1254,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		TransactionId debug_cutoff;
 		bool		debug_all_frozen;
 
-		Assert(presult->lpdead_items == 0);
+		Assert(prstate.lpdead_items == 0);
+		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
 										prstate.cutoffs->OldestXmin,
@@ -1200,62 +1265,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		Assert(prstate.all_frozen == debug_all_frozen);
 
 		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == vm_conflict_horizon);
+			   debug_cutoff == prstate.visibility_cutoff_xid);
 	}
 #endif
 
-	Assert(!prstate.all_frozen || prstate.all_visible);
-	Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
-	/*
-	 * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
-	 * based on information from the VM and the all_visible/all_frozen flags.
-	 *
-	 * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
-	 * VM bit is clear, we strongly prefer to keep them in sync.
-	 *
-	 * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
-	 * already been set. Setting only the VM is most common when setting an
-	 * already all-visible page all-frozen.
-	 */
-	do_set_vm = heap_page_will_set_vis(params->relation,
-									   blockno,
-									   buffer,
-									   vmbuffer,
-									   params->blk_known_av,
-									   &prstate,
-									   &presult->new_vmbits,
-									   &do_set_pd_vis);
-
-	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
-	Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+	/* Copy information back for caller */
+	presult->ndeleted = prstate.ndeleted;
+	presult->nnewlpdead = prstate.ndead;
+	presult->nfrozen = prstate.nfrozen;
+	presult->live_tuples = prstate.live_tuples;
+	presult->recently_dead_tuples = prstate.recently_dead_tuples;
+	presult->hastup = prstate.hastup;
+	presult->new_vmbits = new_vmbits;
+	presult->old_vmbits = old_vmbits;
 
-	/*
-	 * new_vmbits should be 0 regardless of whether or not the page is
-	 * all-visible if we do not intend to set the VM.
-	 */
-	Assert(do_set_vm || presult->new_vmbits == 0);
+	presult->lpdead_items = prstate.lpdead_items;
+	/* the presult->deadoffsets array was already filled in */
 
-	if (do_set_pd_vis)
+	if (prstate.attempt_freeze)
 	{
-		/*
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		MarkBufferDirty(buffer);
-		PageSetAllVisible(page);
+		if (presult->nfrozen > 0)
+		{
+			*new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+		}
+		else
+		{
+			*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+			*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+		}
 	}
-
-	presult->old_vmbits = 0;
-	if (do_set_vm)
-		presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
-												InvalidXLogRecPtr,
-												vmbuffer, vm_conflict_horizon,
-												presult->new_vmbits);
 }
 
 
@@ -2387,14 +2426,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  *
  * This is used for several different page maintenance operations:
  *
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
  *   redirected, some marked dead, and some removed altogether.
  *
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
+ *
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ *   marked as unused.
  *
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ *   all-visible and all-frozen.
  *
- * They have enough commonalities that we use a single WAL record for them
+ * These changes all happen together, so we use a single WAL record for them
  * all.
  *
  * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2406,6 +2449,15 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * case, vmbuffer should already have been updated and marked dirty and should
  * still be pinned and locked.
  *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
+ * In some cases, such as when heap_page_prune_and_freeze() is setting an
+ * already marked all-visible page all-frozen, PD_ALL_VISIBLE may already be
+ * set. So, it is possible for vmflags to be non-zero and set_pd_all_vis to be
+ * false.
+ *
  * Note: This function scribbles on the 'frozen' array.
  *
  * Note: This is called in a critical section, so careful what you do here.
@@ -2415,6 +2467,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 						  Buffer vmbuffer, uint8 vmflags,
 						  TransactionId conflict_xid,
 						  bool cleanup_lock,
+						  bool set_pd_all_vis,
 						  PruneReason reason,
 						  HeapTupleFreeze *frozen, int nfrozen,
 						  OffsetNumber *redirected, int nredirected,
@@ -2451,7 +2504,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 */
 	if (!do_prune &&
 		nfrozen == 0 &&
-		(!do_set_vm || !XLogHintBitIsNeeded()))
+		(!set_pd_all_vis || !XLogHintBitIsNeeded()))
 		regbuf_flags_heap |= REGBUF_NO_IMAGE;
 
 	/*
@@ -2569,7 +2622,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	 * See comment at the top of the function about regbuf_flags_heap for
 	 * details on when we can advance the page LSN.
 	 */
-	if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+	if (do_prune || nfrozen > 0 ||
+		(set_pd_all_vis && XLogHintBitIsNeeded()))
 	{
 		Assert(BufferIsDirty(buffer));
 		PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4aa425ec945..0d39d57115d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2776,6 +2776,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 								  vmflags,
 								  conflict_xid,
 								  false,	/* no cleanup lock required */
+								  (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
 								  PRUNE_VACUUM_CLEANUP,
 								  NULL, 0,	/* frozen */
 								  NULL, 0,	/* redirected */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b20096b6ca1..14c1d92604d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -435,6 +435,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 									  Buffer vmbuffer, uint8 vmflags,
 									  TransactionId conflict_xid,
 									  bool cleanup_lock,
+									  bool set_pd_all_vis,
 									  PruneReason reason,
 									  HeapTupleFreeze *frozen, int nfrozen,
 									  OffsetNumber *redirected, int nredirected,
-- 
2.43.0



  [text/x-patch] v23-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 7-v23-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From fd1518bb82741f8b0e554206c2e35a64bf12fbc3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v23 06/14] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0d39d57115d..d03442abcc1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1872,9 +1872,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1891,13 +1894,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  true, /* set_pd_all_vis */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v23-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.4K, 8-v23-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From faf079025c6354fb2fcb0695da29118c476ae4dd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v23 07/14] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 155 ++---------------------
 src/backend/access/heap/pruneheap.c      |   6 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 112 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  28 +---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 46 insertions(+), 375 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4d382a04338..3ad78ba4694 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		else if (all_frozen_set)
 		{
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8812,50 +8812,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index b1ceab71928..f0de2c136a0 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -254,7 +254,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -267,142 +267,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with a
- * situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent page
- * modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -780,8 +644,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -793,11 +657,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1378,9 +1242,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b851d723c74..7a778ad3bad 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1211,9 +1211,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_set_vm)
 		{
 			Assert(PageIsAllVisible(page));
-			old_vmbits = visibilitymap_set_vmbits(blockno,
-												  vmbuffer, new_vmbits,
-												  params->relation->rd_locator);
+			old_vmbits = visibilitymap_set(blockno,
+										   vmbuffer, new_vmbits,
+										   params->relation->rd_locator);
 			Assert(old_vmbits != new_vmbits);
 		}
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d03442abcc1..5d88a1592e3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1894,11 +1894,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 				log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2781,9 +2781,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 * set PD_ALL_VISIBLE.
 		 */
 		PageSetAllVisible(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = visibility_cutoff_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-	return status;
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..69678187832 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
 
 #define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
 
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBILITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
 #define		XLHP_IS_CATALOG_REL			(1 << 1)
 
 /*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
-							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
-							   Buffer vmBuf,
-							   TransactionId cutoff_xid,
-							   uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
-									  Buffer vmBuf, uint8 flags,
-									  const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+							   Buffer vmBuf, uint8 flags,
+							   const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cf3f6a7dafd..a139705de01 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4302,7 +4302,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v23-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (8.1K, 9-v23-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
  download | inline diff:
From 221a5db2d8a7cee07053c30f635be0b27bae2242 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v23 08/14] Rename GlobalVisTestIsRemovableXid() to
 GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.

The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c |  6 +++---
 src/backend/access/heap/pruneheap.c         | 12 ++++++------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 17 ++++++++---------
 src/include/utils/snapmgr.h                 |  4 ++--
 5 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 		return false;
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
-	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+	return GlobalVisXidVisibleToAll(vistest,
+									HeapTupleHeaderGetRawXmax(tuple));
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7a778ad3bad..1a3c7cf1ef5 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -253,7 +253,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
 		return;
 
 	/*
@@ -487,7 +487,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 	 * Determining HTSV only once for each tuple is required for correctness,
 	 * to deal with cases where running HTSV twice could result in different
 	 * results.  For example, RECENTLY_DEAD can turn to DEAD if another
-	 * checked item causes GlobalVisTestIsRemovableFullXid() to update the
+	 * checked item causes GlobalVisFullXidVisibleToAll() to update the
 	 * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
 	 * transaction aborts.
 	 *
@@ -1327,11 +1327,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	 * Determine whether or not the tuple is considered dead when compared
 	 * with the provided GlobalVisState. On-access pruning does not provide
 	 * VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
-	 * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
-	 * if the GlobalVisState has been updated since the beginning of vacuuming
+	 * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+	 * the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1790,7 +1790,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				/*
 				 * For now always use prstate->cutoffs for this test, because
 				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisTestIsRemovableXid
+				 * is requested. We could use GlobalVisXidVisibleToAll()
 				 * instead, if a non-freezing caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 71ef2e5036f..1c0eb425ee9 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisXidVisibleToAll(vistest, dt->xid)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
  * See comment for GlobalVisState for details.
  */
 bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 }
 
 /*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
  *
  * It is crucial that this only gets called for xids from a source that
  * protects against xid wraparounds (e.g. from a table and thus protected by
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
 {
 	FullTransactionId fxid;
 
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
  */
 bool
 GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisFullXidVisibleToAll(state, fxid);
 }
 
 /*
  * Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
  */
 bool
 GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisXidVisibleToAll(state, xid);
 }
 
 /*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v23-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (10.6K, 10-v23-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From c75f7d1281fadf5c49e37577ef42ff96b92b3f59 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v23 09/14] Use GlobalVisState in vacuum to determine page
 level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.

This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++
 src/backend/access/heap/pruneheap.c         | 43 +++++++++------------
 src/backend/access/heap/vacuumlazy.c        | 10 ++---
 src/include/access/heapam.h                 | 11 +++---
 4 files changed, 58 insertions(+), 34 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+								  Buffer buffer)
+{
+	TransactionId dead_after = InvalidTransactionId;
+	HTSV_Result res;
+
+	res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+	if (res == HEAPTUPLE_RECENTLY_DEAD)
+	{
+		Assert(TransactionIdIsValid(dead_after));
+
+		if (GlobalVisXidVisibleToAll(vistest, dead_after))
+			res = HEAPTUPLE_DEAD;
+	}
+	else
+		Assert(!TransactionIdIsValid(dead_after));
+
+	return res;
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1a3c7cf1ef5..d836bbeaf52 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -450,11 +450,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/*
 	 * The visibility cutoff xid is the newest xmin of live, committed tuples
-	 * older than OldestXmin on the page. This field is only kept up-to-date
-	 * if the page is all-visible. As soon as a tuple is encountered that is
-	 * not visible to all, this field is unmaintained. As long as it is
-	 * maintained, it can be used to calculate the snapshot conflict horizon
-	 * when updating the VM and/or freezing all the tuples on the page.
+	 * on the page older than the visibility horizon represented in the
+	 * GlobalVisState. This field is only kept up-to-date if the page is
+	 * all-visible. As soon as a tuple is encountered that is not visible to
+	 * all, this field is unmaintained. As long as it is maintained, it can be
+	 * used to calculate the snapshot conflict horizon when updating the VM
+	 * and/or freezing all the tuples on the page.
 	 */
 	prstate->visibility_cutoff_xid = InvalidTransactionId;
 }
@@ -980,14 +981,13 @@ heap_page_will_set_vis(Relation relation,
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
-	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+	return heap_page_would_be_all_visible(rel, buf, vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -1078,6 +1078,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	prune_freeze_plan(RelationGetRelid(params->relation),
 					  buffer, &prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them is not visible to everyone, the page cannot be
+	 * all-visible.
+	 */
+	if (prstate.all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+		prstate.all_visible = prstate.all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1255,10 +1265,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		bool		debug_all_frozen;
 
 		Assert(prstate.lpdead_items == 0);
-		Assert(prstate.cutoffs);
 
 		Assert(heap_page_is_all_visible(params->relation, buffer,
-										prstate.cutoffs->OldestXmin,
+										prstate.vistest,
 										&debug_all_frozen,
 										&debug_cutoff, off_loc));
 
@@ -1787,20 +1796,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisXidVisibleToAll()
-				 * instead, if a non-freezing caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5d88a1592e3..c0e1350cb11 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2735,7 +2735,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3495,7 +3495,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3511,7 +3511,7 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3585,7 +3585,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 
 		/* Visibility checks may do IO or allocate memory */
 		Assert(CritSectionCount == 0);
-		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
 		{
 			case HEAPTUPLE_LIVE:
 				{
@@ -3604,7 +3604,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					 * that everyone sees it as committed?
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
+					if (!GlobalVisXidVisibleToAll(vistest, xmin))
 					{
 						all_visible = false;
 						*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 14c1d92604d..4702ec00dea 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -278,10 +278,9 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Contains the cutoffs used for freezing. They are required if the
-	 * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
-	 * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
-	 * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
-	 * calculates them once, at the beginning of vacuuming the relation.
+	 * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+	 * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+	 * relation.
 	 */
 	struct VacuumCutoffs *cutoffs;
 } PruneFreezeParams;
@@ -446,7 +445,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
 extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -460,6 +459,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+													 GlobalVisState *vistest, Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v23-0010-Unset-all_visible-sooner-if-not-freezing.patch (2.5K, 11-v23-0010-Unset-all_visible-sooner-if-not-freezing.patch)
  download | inline diff:
From 23236fd69abba5d481ff228d0fe1486ba40eddf3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v23 10/14] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.

However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d836bbeaf52..03b8ddcc38d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1653,8 +1653,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	/*
 	 * Deliberately delay unsetting all_visible and all_frozen until later
 	 * during pruning. Removable dead tuples shouldn't preclude freezing the
-	 * page.
+	 * page. If we won't attempt freezing, just unset all-visible now, though.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1913,8 +1918,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
 	 * Similarly, don't unset all_visible and all_frozen until later, at the
 	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
 	 * freeze the page after pruning.  As long as we unset it before updating
-	 * the visibility map, this will be correct.
+	 * the visibility map, this will be correct. If we won't attempt freezing,
+	 * though, just unset all-visible now.
 	 */
+	if (!prstate->attempt_freeze)
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+	}
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-- 
2.43.0



  [text/x-patch] v23-0011-Track-which-relations-are-modified-by-a-query.patch (2.5K, 12-v23-0011-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From be0f2805786ba0d4711c39fe4896a3d6f51feba1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v23 11/14] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
---
 src/backend/executor/execMain.c  | 4 ++++
 src/backend/executor/execUtils.c | 2 ++
 src/include/nodes/execnodes.h    | 6 ++++++
 3 files changed, 12 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..0630a5af79e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation is modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 64ff6996431..7f6522cea8e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query either through
+	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v23-0012-Pass-down-information-on-table-modification-to-s.patch (23.0K, 13-v23-0012-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 804296ddbb6b3553d37492d2f79d034df71fd3e5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v23 12/14] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  7 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 +++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  4 ++--
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++----
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      | 11 ++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              | 19 ++++++++++-------
 23 files changed, 91 insertions(+), 44 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index cb3331921cb..b9613787b85 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index f87c60a230c..645688f9241 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..d7fac94826d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0(sizeof(IndexFetchHeapData));
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index c96917085c2..9d425504e1b 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
 	}
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 454adaee7dc..02ab0233e59 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 1e099febdc8..db2a302a486 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..22b453dc617 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 07e5b95782e..58dbbf4d851 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6345,7 +6345,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13730,7 +13730,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 47d5047fe8b..055759cd343 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index dd323c9b9fd..b41bfeca244 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index def32774c90..473d236e551 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f464cca9507..87b04b1b88e 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f36929deec3..90f929ce741 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 8ba038c5ef4..d3b340ee2a7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3370,7 +3370,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 540aa9628d7..28434146eba 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4702ec00dea..fc2c8314e97 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
 
 	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
 	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d10b1b03cdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the rel */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
 									   NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1164,9 +1167,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v23-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch (13.9K, 14-v23-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 422613795deaaef2bdd43cd0767e019cbdd44f50 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v23 13/14] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 15 +++-
 src/backend/access/heap/heapam_handler.c      | 15 +++-
 src/backend/access/heap/pruneheap.c           | 78 ++++++++++++++++---
 src/include/access/heapam.h                   | 24 +++++-
 .../t/035_standby_logical_decoding.pl         |  3 +-
 5 files changed, 116 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3ad78ba4694..ecc04390ac7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	Buffer		buffer = scan->rs_cbuf;
 	BlockNumber block = scan->rs_cblock;
 	Snapshot	snapshot;
+	Buffer	   *vmbuffer = NULL;
 	Page		page;
 	int			lines;
 	bool		all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &scan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d7fac94826d..27e3498f5f4 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2472,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
+	Buffer	   *vmbuffer = NULL;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 	Assert(hscan->rs_read_stream);
@@ -2518,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+		vmbuffer = &hscan->rs_vmbuffer;
+	heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 03b8ddcc38d..04f10054402 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,7 +205,9 @@ static bool heap_page_will_set_vis(Relation relation,
 								   Buffer heap_buf,
 								   Buffer vmbuffer,
 								   bool blk_known_av,
-								   const PruneState *prstate,
+								   PruneReason reason,
+								   bool do_prune, bool do_freeze,
+								   PruneState *prstate,
 								   uint8 *new_vmbits,
 								   bool *do_set_pd_vis);
 
@@ -221,9 +223,13 @@ static bool heap_page_will_set_vis(Relation relation,
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -305,6 +311,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 				.cutoffs = NULL,
 			};
 
+			if (vmbuffer)
+			{
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+				params.options |= HEAP_PAGE_PRUNE_UPDATE_VIS;
+				params.vmbuffer = *vmbuffer;
+			}
+
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
 
@@ -863,6 +876,9 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm, uint8 new_vmbits
  * corrupted, it will fix them by clearing the VM bit and page hint. This does
  * not need to be done in a critical section.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set, along with the desired
  * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
  * PD_ALL_VISIBLE should be set on the heap page.
@@ -873,7 +889,9 @@ heap_page_will_set_vis(Relation relation,
 					   Buffer heap_buf,
 					   Buffer vmbuffer,
 					   bool blk_known_av,
-					   const PruneState *prstate,
+					   PruneReason reason,
+					   bool do_prune, bool do_freeze,
+					   PruneState *prstate,
 					   uint8 *new_vmbits,
 					   bool *do_set_pd_vis)
 {
@@ -888,6 +906,24 @@ heap_page_will_set_vis(Relation relation,
 		return false;
 	}
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS &&
+		prstate->all_visible &&
+		!do_prune && !do_freeze &&
+		(!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+	{
+		prstate->all_visible = false;
+		prstate->all_frozen = false;
+		return false;
+	}
+
 	/*
 	 * It should never be the case that the visibility map page is set while
 	 * the page-level bit is clear, but the reverse is allowed (if checksums
@@ -921,14 +957,15 @@ heap_page_will_set_vis(Relation relation,
 	 * WAL-logged.
 	 *
 	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
+	 * page-level bit is clear.  However, it's possible that in vacuum the bit
+	 * got cleared after heap_vac_scan_next_block() was called, so we must
+	 * recheck with buffer lock before concluding that the VM is corrupt.
 	 *
 	 * Callers which did not check the visibility map and determine
 	 * blk_known_av will not be eligible for this, however the cost of
 	 * potentially needing to read the visibility map for pages that are not
-	 * all-visible is too high to justify generalizing the check.
+	 * all-visible is too high to justify generalizing the check. A future
+	 * vacuum will have to take care of fixing the corruption.
 	 */
 	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
 			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -1149,6 +1186,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	do_set_vm = heap_page_will_set_vis(params->relation,
 									   blockno, buffer, vmbuffer, params->blk_known_av,
+									   params->reason, do_prune, do_freeze,
 									   &prstate, &new_vmbits, &do_set_pd_vis);
 
 	/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -1224,13 +1262,29 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			old_vmbits = visibilitymap_set(blockno,
 										   vmbuffer, new_vmbits,
 										   params->relation->rd_locator);
-			Assert(old_vmbits != new_vmbits);
+
+			/*
+			 * If on-access pruning set the VM in between when vacuum first
+			 * checked the visibility map and determined blk_known_av and when
+			 * we actually prune the page, we could end up trying to set the
+			 * VM only to find it is already set.
+			 */
+			if (old_vmbits == new_vmbits)
+			{
+				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+				/* Unset so we don't emit WAL since no change occured */
+				do_set_vm = false;
+			}
 		}
 
 		/*
-		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
+		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+		 * only planning to update the VM, and it turns out that it was
+		 * already set, there is no need to emit WAL. As such, we must check
+		 * that some change is required again.
 		 */
-		if (RelationNeedsWAL(params->relation))
+		if (RelationNeedsWAL(params->relation) &&
+			(do_prune || do_freeze || do_set_vm))
 		{
 			log_heap_prune_and_freeze(params->relation, buffer,
 									  do_set_vm ? vmbuffer : InvalidBuffer,
@@ -2440,8 +2494,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
  * - Reaping: During vacuum phase III, items that are already LP_DEAD are
  *   marked as unused.
  *
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- *   all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ *   may be marked all-visible and all-frozen.
  *
  * These changes all happen together, so we use a single WAL record for them
  * all.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index fc2c8314e97..3f2b5eedfff 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. If the relation is not
+	 * being modified, on-access pruning may read in the current heap page's
+	 * corresponding VM block to this buffer.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/*
+	 * For index scans that do not modify the underlying heap table, on-access
+	 * pruning may read in the current heap page's corresponding VM block to
+	 * this buffer.
+	 */
+	Buffer		xs_vmbuffer;
 
 	/*
 	 * Some optimizations can only be performed if the query does not modify
@@ -425,7 +442,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v23-0014-Set-pd_prune_xid-on-insert.patch (6.7K, 15-v23-0014-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 59d58156716426668022402d030cf8de7fcac928 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v23 14/14] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.

ci-os-only:
---
 src/backend/access/heap/heapam.c              | 25 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 15 ++++++++++-
 .../modules/index/expected/killtuples.out     |  6 ++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ecc04390ac7..d5f3f897dd3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode, though.
 	 */
+	page = BufferGetPage(buffer);
+	if (TransactionIdIsNormal(xid))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index f0de2c136a0..cf62a8df67c 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -465,6 +465,12 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later which may
+		 * set the page all-visible in the VM.
+		 */
+		PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -614,9 +620,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 			PageSetAllVisible(page);
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



view thread (143+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  In-Reply-To: <CAAKRu_agCLAQ9OjmrTdJe-X=Xr7QnU4d=cfxdQGwc9jNx9w31w@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox